Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/6] SUNRPC: make RPC clients use network-namespace-aware PipeFS routines
From: J. Bruce Fields @ 2011-11-23 16:36 UTC (permalink / raw)
  To: Stanislav Kinsbursky
  Cc: Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, xemul-bzQdu9zFT3WakBO8gow8eQ,
	neilb-l3A5Bk7waGM, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	jbottomley-bzQdu9zFT3WakBO8gow8eQ, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	devel-GEFAQzZX7r8dnm+yROfE0A
In-Reply-To: <20111123104945.11077.10270.stgit-bi+AKbBUZKagILUCTcTcHdKyNwTtLsGr@public.gmane.org>

On Wed, Nov 23, 2011 at 02:51:10PM +0300, Stanislav Kinsbursky wrote:
> This patch set was created in context of clone of git
> branch: git://git.linux-nfs.org/projects/trondmy/nfs-2.6.git.
> tag: v3.1
> 
> This patch set depends on previous patch sets titled:
> 1) "SUNRPC: initial part of making pipefs work in net ns"
> 2) "SUNPRC: cleanup PipeFS for network-namespace-aware users"
> 
> This patch set is a first part of reworking SUNPRC PipeFS users.
> It makes SUNRPC clients using PipeFS nofitications for directory and GSS pipes
> dentries creation. With this patch set RPC clients and GSS auth creations
> routines doesn't force SUNRPC PipeFS mount point creation which actually means,
> that they now can work without PipeFS dentries.

I'm not following very well.  (My fault, I haven't been paying
attention.)  Could you summarize the itended behavior of pipefs after
all this is done?

So there's a separate superblock (and separate dentries) for each
namespace?

What decides which clients are visible in which network namespaces?

--b.

> 
> The following series consists of:
> 
> ---
> 
> Stanislav Kinsbursky (6):
>       SUNRPC: handle RPC client pipefs dentries by network namespace aware routines
>       SUNRPC: handle GSS AUTH pipes by network namespace aware routines
>       SUNRPC: subscribe RPC clients to pipefs notifications
>       SUNRPC: remove RPC client pipefs dentries after unregister
>       SUNRPC: remove RPC pipefs mount point manipulations from RPC clients code
>       SUNRPC: remove RPC PipeFS mount point reference from RPC client
> 
> 
>  fs/nfs/idmap.c                 |    4 +
>  fs/nfsd/nfs4callback.c         |    2 -
>  include/linux/nfs.h            |    2 -
>  include/linux/sunrpc/auth.h    |    2 +
>  include/linux/sunrpc/clnt.h    |    2 -
>  net/sunrpc/auth_gss/auth_gss.c |  101 +++++++++++++++++++++------
>  net/sunrpc/clnt.c              |  151 +++++++++++++++++++++++++++++++---------
>  net/sunrpc/rpc_pipe.c          |   19 +++--
>  net/sunrpc/sunrpc.h            |    2 +
>  9 files changed, 218 insertions(+), 67 deletions(-)
> 
> -- 
> Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 01/10] dql: Dynamic queue limits
From: Stephen Hemminger @ 2011-11-23 16:37 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <alpine.DEB.2.00.1111222138280.13686@pokey.mtv.corp.google.com>

On Tue, 22 Nov 2011 21:52:21 -0800 (PST)
Tom Herbert <therbert@google.com> wrote:

> Implementation of dynamic queue limits (dql).  This is a libary which
> allows a queue limit to be dynamically managed.  The goal of dql is
> to set the queue limit, number of objects to the queue, to be minimized
> without allowing the queue to be starved.
> 
> dql would be used with a queue which has these properties:
> 
> 1) Objects are queued up to some limit which can be expressed as a
>    count of objects.
> 2) Periodically a completion process executes which retires consumed
>    objects.
> 3) Starvation occurs when limit has been reached, all queued data has
>    actually been consumed but completion processing has not yet run,
>    so queuing new data is blocked.
> 4) Minimizing the amount of queued data is desirable.
> 
> A canonical example of such a queue would be a NIC HW transmit queue.
> 
> The queue limit is dynamic, it will increase or decrease over time
> depending on the workload.  The queue limit is recalculated each time
> completion processing is done.  Increases occur when the queue is
> starved and can exponentially increase over successive intervals.
> Decreases occur when more data is being maintained in the queue than
> needed to prevent starvation.  The number of extra objects, or "slack",
> is measured over successive intervals, and to avoid hysteresis the
> limit is only reduced by the miminum slack seen over a configurable
> time period.
> 
> dql API provides routines to manage the queue:
> - dql_init is called to intialize the dql structure
> - dql_reset is called to reset dynamic values
> - dql_queued called when objects are being enqueued
> - dql_avail returns availability in the queue
> - dql_completed is called when objects have be consumed in the queue
> 
> Configuration consists of:
> - max_limit, maximum limit
> - min_limit, minimum limit
> - slack_hold_time, time to measure instances of slack before reducing
>   queue limit
> 
> Signed-off-by: Tom Herbert <therbert@google.com>

Some minor stuff:

1. Since linux/dql.h is only kernel components (not API), it should
   be net/dql.h

2. Don't make it a config option. "Do or do not, there is no try"
   Config options are useless for distributions. I assume the plan
   is that devices using DQL will enable it in their config.
   Therefore adding device with DQL should modify the Kconfig for that
   device.

3. Rather using -1ul in DQL_MAX_ why not use ULONG_MAX?

4. dql_avail should be:
static inline long dql_avail(const struct dql *dql)
{
	return dql->limit - (dql->num_queued - dql->num_completed);
}

5. dql_init should be: 
void dql_init(struct dql *dql, unsigned hold_time)


For clarity, IMHO would be better to change dql_completed()
so  the code read like the comments.

Rather than:
	completed = dql->num_completed + count;
	limit = dql->limit;
	ovlimit = POSDIFF(dql->num_queued - dql->num_completed, limit);
	inprogress = dql->num_queued - completed;
	prev_inprogress = dql->prev_num_queued - dql->num_completed;
	all_prev_completed = POSDIFF(completed, dql->prev_num_queued);

	if ((ovlimit && !inprogress) ||
	    (dql->prev_ovlimit && all_prev_completed)) {
Instead:
	if (dql_queue_starved(dql)) {
		...
        } else if (dql_queue_busy(dql)) {
		...

Where dql_queue_starved and dql_queue_busy are inline's

^ permalink raw reply

* [PATCH v2] ipv4 : igmp : optimize timer modify logic in igmp_mod_timer()
From: Jun Zhao @ 2011-11-23 16:38 UTC (permalink / raw)
  To: davem; +Cc: eric.dumazet, netdev, Jun Zhao

When timer is pending and expires less-than-or-equal-to new delay,
we need not used del_timer()/add_timer().

Signed-off-by: Jun Zhao <mypopydev@gmail.com>
---
 net/ipv4/igmp.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index c7472ef..5f2aedf 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -215,14 +215,14 @@ static void igmp_mod_timer(struct ip_mc_list *im, int max_delay)
 {
 	spin_lock_bh(&im->lock);
 	im->unsolicit_count = 0;
-	if (del_timer(&im->timer)) {
-		if ((long)(im->timer.expires-jiffies) < max_delay) {
-			add_timer(&im->timer);
-			im->tm_running = 1;
+	if (timer_pending(&im->timer)) {
+		if (time_before_eq(im->timer.expires, (jiffies + max_delay))) {
 			spin_unlock_bh(&im->lock);
 			return;
+		} else {
+			del_timer(&im->timer);
+			atomic_dec(&im->refcnt);
 		}
-		atomic_dec(&im->refcnt);
 	}
 	igmp_start_timer(im, max_delay);
 	spin_unlock_bh(&im->lock);
-- 
1.7.2.5

^ permalink raw reply related

* Re: [PATCH 1/2] ax25: integer overflows in ax25_setsockopt()
From: Ralf Baechle @ 2011-11-23 17:09 UTC (permalink / raw)
  To: Xi Wang
  Cc: linux-kernel, Joerg Reuter, David Miller, linux-hams, netdev,
	Thomas Osterried
In-Reply-To: <7187C142-99F1-4A96-9BE6-650B10C9B22D@gmail.com>

On Tue, Nov 22, 2011 at 11:28:24PM -0500, Xi Wang wrote:

> ax25_setsockopt() misses several upper-bound checks on the
> user-controlled value.
> 
> 
> Reported-by: Fan Long <longfancn@gmail.com>
> Signed-off-by: Xi Wang <xi.wang@gmail.com>
> ---
>  net/ax25/af_ax25.c |    8 ++++----
>  1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
> index e7c69f4..be6a8cf 100644
> --- a/net/ax25/af_ax25.c
> +++ b/net/ax25/af_ax25.c
> @@ -571,7 +571,7 @@ static int ax25_setsockopt(struct socket *sock, int level, int optname,
>  		break;
>  
>  	case AX25_T1:
> -		if (opt < 1) {
> +		if (opt < 1 || opt > 30) {
>  			res = -EINVAL;
>  			break;
>  		}

30 seconds is definitively too low.  The TCP spec assumes that a RTT
of up to 120s is possible.  In slow packet radio networks 15 minutes are
easily possible in HF networks.

How about AX.25 to the P5A mars mission?

A silly value for T1 will be caught further down the road, so no damage
but an application really should receive a sensible return value when
trying something stupid.

If an apps wants to shoot itself into the foot there is nothing wrong with
being the arms dealer so an error check should be something like

	if (val > ULONG_MAX / HZ)
		bail_out;

which will do the right thing even on a 64-bit system and prevent an
overflow in the multiplication further down.

> @@ -580,7 +580,7 @@ static int ax25_setsockopt(struct socket *sock, int level, int optname,
>  		break;
>  
>  	case AX25_T2:
> -		if (opt < 1) {
> +		if (opt < 1 || opt > 20) {
>  			res = -EINVAL;
>  			break;
>  		}

An excessive value here could result in a timer being set to expire in
the past similar in effect to setting a very low value.  Again it's ok
if a user tries to shoot himself into the other foot as well.

> @@ -596,7 +596,7 @@ static int ax25_setsockopt(struct socket *sock, int level, int optname,
>  		break;
>  
>  	case AX25_T3:
> -		if (opt < 1) {
> +		if (opt < 0 || opt > 3600) {
>  			res = -EINVAL;
>  			break;
>  		}

For a stable link it should be possible to set a very high T3 value,
potencially high enough to effectively disable the T3 functionality.

> @@ -604,7 +604,7 @@ static int ax25_setsockopt(struct socket *sock, int level, int optname,
>  		break;
>  
>  	case AX25_IDLE:
> -		if (opt < 0) {
> +		if (opt < 0 || opt > 65535) {
>  			res = -EINVAL;
>  			break;
>  		}

I have an updated patch which I'm testing right now.

  Ralf

^ permalink raw reply

* |PATCH net-next] net: treewide use of RCU_INIT_POINTER
From: Eric Dumazet @ 2011-11-23 17:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

rcu_assign_pointer(ptr, NULL) can be safely replaced by
RCU_INIT_POINTER(ptr, NULL)

(old rcu_assign_pointer() macro was testing the NULL value and could
omit the smp_wmb(), but this had to be removed because of compiler
warnings)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/ethernet/broadcom/bnx2.c               |    2 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   |    2 -
 drivers/net/ethernet/broadcom/cnic.c               |    6 ++---
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c |    4 +--
 drivers/net/macvtap.c                              |    8 +++----
 drivers/net/ppp/pptp.c                             |    2 -
 drivers/net/team/team_mode_activebackup.c          |    2 -
 drivers/net/wireless/ath/carl9170/main.c           |   12 +++++------
 net/core/netprio_cgroup.c                          |    4 +--
 9 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 83d8cef..d573169 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -409,7 +409,7 @@ static int bnx2_unregister_cnic(struct net_device *dev)
 	mutex_lock(&bp->cnic_lock);
 	cp->drv_state = 0;
 	bnapi->cnic_present = 0;
-	rcu_assign_pointer(bp->cnic_ops, NULL);
+	RCU_INIT_POINTER(bp->cnic_ops, NULL);
 	mutex_unlock(&bp->cnic_lock);
 	synchronize_rcu();
 	return 0;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 83481e2..0cdbb70 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -11587,7 +11587,7 @@ static int bnx2x_unregister_cnic(struct net_device *dev)
 
 	mutex_lock(&bp->cnic_mutex);
 	cp->drv_state = 0;
-	rcu_assign_pointer(bp->cnic_ops, NULL);
+	RCU_INIT_POINTER(bp->cnic_ops, NULL);
 	mutex_unlock(&bp->cnic_mutex);
 	synchronize_rcu();
 	kfree(bp->cnic_kwq);
diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index 099f41d..b336e55 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -506,7 +506,7 @@ int cnic_unregister_driver(int ulp_type)
 	}
 	read_unlock(&cnic_dev_lock);
 
-	rcu_assign_pointer(cnic_ulp_tbl[ulp_type], NULL);
+	RCU_INIT_POINTER(cnic_ulp_tbl[ulp_type], NULL);
 
 	mutex_unlock(&cnic_lock);
 	synchronize_rcu();
@@ -579,7 +579,7 @@ static int cnic_unregister_device(struct cnic_dev *dev, int ulp_type)
 	}
 	mutex_lock(&cnic_lock);
 	if (rcu_dereference(cp->ulp_ops[ulp_type])) {
-		rcu_assign_pointer(cp->ulp_ops[ulp_type], NULL);
+		RCU_INIT_POINTER(cp->ulp_ops[ulp_type], NULL);
 		cnic_put(dev);
 	} else {
 		pr_err("%s: device not registered to this ulp type %d\n",
@@ -5134,7 +5134,7 @@ static void cnic_stop_hw(struct cnic_dev *dev)
 		}
 		cnic_shutdown_rings(dev);
 		clear_bit(CNIC_F_CNIC_UP, &dev->flags);
-		rcu_assign_pointer(cp->ulp_ops[CNIC_ULP_L4], NULL);
+		RCU_INIT_POINTER(cp->ulp_ops[CNIC_ULP_L4], NULL);
 		synchronize_rcu();
 		cnic_cm_shutdown(dev);
 		cp->stop_hw(dev);
diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
index 90ff131..7f7882d 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
@@ -1301,7 +1301,7 @@ int cxgb3_offload_activate(struct adapter *adapter)
 
 out_free_l2t:
 	t3_free_l2t(L2DATA(dev));
-	rcu_assign_pointer(dev->l2opt, NULL);
+	RCU_INIT_POINTER(dev->l2opt, NULL);
 out_free:
 	kfree(t);
 	return err;
@@ -1329,7 +1329,7 @@ void cxgb3_offload_deactivate(struct adapter *adapter)
 	rcu_read_lock();
 	d = L2DATA(tdev);
 	rcu_read_unlock();
-	rcu_assign_pointer(tdev->l2opt, NULL);
+	RCU_INIT_POINTER(tdev->l2opt, NULL);
 	call_rcu(&d->rcu_head, clean_l2_data);
 	if (t->nofail_skb)
 		kfree_skb(t->nofail_skb);
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 1b7082d..7c88d13 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -145,8 +145,8 @@ static void macvtap_put_queue(struct macvtap_queue *q)
 	if (vlan) {
 		int index = get_slot(vlan, q);
 
-		rcu_assign_pointer(vlan->taps[index], NULL);
-		rcu_assign_pointer(q->vlan, NULL);
+		RCU_INIT_POINTER(vlan->taps[index], NULL);
+		RCU_INIT_POINTER(q->vlan, NULL);
 		sock_put(&q->sk);
 		--vlan->numvtaps;
 	}
@@ -223,8 +223,8 @@ static void macvtap_del_queues(struct net_device *dev)
 					      lockdep_is_held(&macvtap_lock));
 		if (q) {
 			qlist[j++] = q;
-			rcu_assign_pointer(vlan->taps[i], NULL);
-			rcu_assign_pointer(q->vlan, NULL);
+			RCU_INIT_POINTER(vlan->taps[i], NULL);
+			RCU_INIT_POINTER(q->vlan, NULL);
 			vlan->numvtaps--;
 		}
 	}
diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index 89f829f..ede899c 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -162,7 +162,7 @@ static void del_chan(struct pppox_sock *sock)
 {
 	spin_lock(&chan_lock);
 	clear_bit(sock->proto.pptp.src_addr.call_id, callid_bitmap);
-	rcu_assign_pointer(callid_sock[sock->proto.pptp.src_addr.call_id], NULL);
+	RCU_INIT_POINTER(callid_sock[sock->proto.pptp.src_addr.call_id], NULL);
 	spin_unlock(&chan_lock);
 	synchronize_rcu();
 }
diff --git a/drivers/net/team/team_mode_activebackup.c b/drivers/net/team/team_mode_activebackup.c
index b344275..f4d960e 100644
--- a/drivers/net/team/team_mode_activebackup.c
+++ b/drivers/net/team/team_mode_activebackup.c
@@ -56,7 +56,7 @@ drop:
 static void ab_port_leave(struct team *team, struct team_port *port)
 {
 	if (ab_priv(team)->active_port == port)
-		rcu_assign_pointer(ab_priv(team)->active_port, NULL);
+		RCU_INIT_POINTER(ab_priv(team)->active_port, NULL);
 }
 
 static int ab_active_port_get(struct team *team, void *arg)
diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
index f06e069..5518592 100644
--- a/drivers/net/wireless/ath/carl9170/main.c
+++ b/drivers/net/wireless/ath/carl9170/main.c
@@ -446,7 +446,7 @@ static void carl9170_op_stop(struct ieee80211_hw *hw)
 
 	mutex_lock(&ar->mutex);
 	if (IS_ACCEPTING_CMD(ar)) {
-		rcu_assign_pointer(ar->beacon_iter, NULL);
+		RCU_INIT_POINTER(ar->beacon_iter, NULL);
 
 		carl9170_led_set_state(ar, 0);
 
@@ -678,7 +678,7 @@ unlock:
 		vif_priv->active = false;
 		bitmap_release_region(&ar->vif_bitmap, vif_id, 0);
 		ar->vifs--;
-		rcu_assign_pointer(ar->vif_priv[vif_id].vif, NULL);
+		RCU_INIT_POINTER(ar->vif_priv[vif_id].vif, NULL);
 		list_del_rcu(&vif_priv->list);
 		mutex_unlock(&ar->mutex);
 		synchronize_rcu();
@@ -716,7 +716,7 @@ static void carl9170_op_remove_interface(struct ieee80211_hw *hw,
 	WARN_ON(vif_priv->enable_beacon);
 	vif_priv->enable_beacon = false;
 	list_del_rcu(&vif_priv->list);
-	rcu_assign_pointer(ar->vif_priv[id].vif, NULL);
+	RCU_INIT_POINTER(ar->vif_priv[id].vif, NULL);
 
 	if (vif == main_vif) {
 		rcu_read_unlock();
@@ -1258,7 +1258,7 @@ static int carl9170_op_sta_add(struct ieee80211_hw *hw,
 		}
 
 		for (i = 0; i < CARL9170_NUM_TID; i++)
-			rcu_assign_pointer(sta_info->agg[i], NULL);
+			RCU_INIT_POINTER(sta_info->agg[i], NULL);
 
 		sta_info->ampdu_max_len = 1 << (3 + sta->ht_cap.ampdu_factor);
 		sta_info->ht_sta = true;
@@ -1285,7 +1285,7 @@ static int carl9170_op_sta_remove(struct ieee80211_hw *hw,
 			struct carl9170_sta_tid *tid_info;
 
 			tid_info = rcu_dereference(sta_info->agg[i]);
-			rcu_assign_pointer(sta_info->agg[i], NULL);
+			RCU_INIT_POINTER(sta_info->agg[i], NULL);
 
 			if (!tid_info)
 				continue;
@@ -1398,7 +1398,7 @@ static int carl9170_op_ampdu_action(struct ieee80211_hw *hw,
 			spin_unlock_bh(&ar->tx_ampdu_list_lock);
 		}
 
-		rcu_assign_pointer(sta_info->agg[tid], NULL);
+		RCU_INIT_POINTER(sta_info->agg[tid], NULL);
 		rcu_read_unlock();
 
 		ieee80211_stop_tx_ba_cb_irqsafe(vif, sta->addr, tid);
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 72ad0bc..3a9fd48 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -285,7 +285,7 @@ static int netprio_device_event(struct notifier_block *unused,
 		break;
 	case NETDEV_UNREGISTER:
 		old = rtnl_dereference(dev->priomap);
-		rcu_assign_pointer(dev->priomap, NULL);
+		RCU_INIT_POINTER(dev->priomap, NULL);
 		if (old)
 			kfree_rcu(old, rcu);
 		break;
@@ -332,7 +332,7 @@ static void __exit exit_cgroup_netprio(void)
 	rtnl_lock();
 	for_each_netdev(&init_net, dev) {
 		old = rtnl_dereference(dev->priomap);
-		rcu_assign_pointer(dev->priomap, NULL);
+		RCU_INIT_POINTER(dev->priomap, NULL);
 		if (old)
 			kfree_rcu(old, rcu);
 	}

^ permalink raw reply related

* Re: [PATCH 0/6] SUNRPC: make RPC clients use network-namespace-aware PipeFS routines
From: Stanislav Kinsbursky @ 2011-11-23 17:18 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Trond.Myklebust@netapp.com, linux-nfs@vger.kernel.org,
	Pavel Emelianov, neilb@suse.de, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, James Bottomley,
	davem@davemloft.net, devel@openvz.org
In-Reply-To: <20111123162711.GA30672@fieldses.org>

23.11.2011 20:27, J. Bruce Fields пишет:
> On Wed, Nov 23, 2011 at 02:51:10PM +0300, Stanislav Kinsbursky wrote:
>> This patch set was created in context of clone of git
>> branch: git://git.linux-nfs.org/projects/trondmy/nfs-2.6.git.
>> tag: v3.1
>>
>> This patch set depends on previous patch sets titled:
>> 1) "SUNRPC: initial part of making pipefs work in net ns"
>> 2) "SUNPRC: cleanup PipeFS for network-namespace-aware users"
>
> Do you have a git tree set up with all of these applied?
>

I have, but no external access to it yet.

> (If I want to take a quick look it would personally be easier for me to
> fetch your git tree than to get those patches out of old email.)
>

Will check tomorrow, is it possible to organize it.

> --b.
>


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply

* Re: [PATCH v3 01/10] dql: Dynamic queue limits
From: Dave Taht @ 2011-11-23 17:27 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <alpine.DEB.2.00.1111222138280.13686@pokey.mtv.corp.google.com>

On Wed, Nov 23, 2011 at 6:52 AM, Tom Herbert <therbert@google.com> wrote:
> Implementation of dynamic queue limits (dql).  This is a libary which
> allows a queue limit to be dynamically managed.  The goal of dql is
> to set the queue limit, number of objects to the queue, to be minimized
> without allowing the queue to be starved.

Dear Tom:

I have been fiddling with v3 of your patch series for much of the day,
artificially imposing 10Mbit and 100Mbit limits on the network with
ethtool.

The performance of multiple tcp streams 'feels' quite good across
multiple, saturating workloads over the LFN, and short distances, but
the data I collected today was too muddled and confusing to publish.

I need to finish patching this into a couple other boxes before
rebuilding the testbed,
and judging from the code review comments it would seem a good idea to
wait for another go-round.

I did a build on top of commit b4bbb02934e4511d9083f15c23e90703482e84ad

for ubuntu here:

http://huchra.bufferbloat.net/~d/bql/

the usual caveats apply to development kernels... I ran all day without a crash,
testing the e1000e driver, the qfq qdisc, red, and sfq.

two drivers (forcedeth, bnx2x_) did not apply on top of that commit.

If anyone cares, here are also the various QFQ + RED scheduler scripts
I've been fiddling with. These seemed to help at 100Mbit, in the
previous series.

https://github.com/dtaht/deBloat/blob/master/wip/

eqfq (pure qfq), eqfq.red and eqfq.red.10mbit are the only three
seriously tested at the moment.

Would love to know what/how qfq does to cpu at a gigabit (some red
tuning would be needed on top of the script) on a machine that can
drive it, now that there's drivers that stop gobbling packets so
greedily...

^ permalink raw reply

* Re: [PATCH v3 03/10] net: Add netdev interfaces recording send/compl
From: Stephen Hemminger @ 2011-11-23 17:46 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <alpine.DEB.2.00.1111222139560.14623@pokey.mtv.corp.google.com>

On Tue, 22 Nov 2011 21:52:37 -0800 (PST)
Tom Herbert <therbert@google.com> wrote:

> Add interfaces for drivers to call for recording number of packets and
> bytes at send time and transmit completion.  Also, added a function to
> "reset" a queue.  These will be used by Byte Queue Limits.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>

Since all the drivers that you show do this for one packet at a time,
why bother with a packets arguement to netdev_sent_queue()?

^ permalink raw reply

* Re: [PATCH 0/6] SUNRPC: make RPC clients use network-namespace-aware PipeFS routines
From: Stanislav Kinsbursky @ 2011-11-23 17:58 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Trond.Myklebust@netapp.com, linux-nfs@vger.kernel.org,
	Pavel Emelianov, neilb@suse.de, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, James Bottomley,
	davem@davemloft.net, devel@openvz.org
In-Reply-To: <20111123163632.GB30672@fieldses.org>

23.11.2011 20:36, J. Bruce Fields пишет:
> On Wed, Nov 23, 2011 at 02:51:10PM +0300, Stanislav Kinsbursky wrote:
>> This patch set was created in context of clone of git
>> branch: git://git.linux-nfs.org/projects/trondmy/nfs-2.6.git.
>> tag: v3.1
>>
>> This patch set depends on previous patch sets titled:
>> 1) "SUNRPC: initial part of making pipefs work in net ns"
>> 2) "SUNPRC: cleanup PipeFS for network-namespace-aware users"
>>
>> This patch set is a first part of reworking SUNPRC PipeFS users.
>> It makes SUNRPC clients using PipeFS nofitications for directory and GSS pipes
>> dentries creation. With this patch set RPC clients and GSS auth creations
>> routines doesn't force SUNRPC PipeFS mount point creation which actually means,
>> that they now can work without PipeFS dentries.
>
> I'm not following very well.  (My fault, I haven't been paying
> attention.)  Could you summarize the itended behavior of pipefs after
> all this is done?
>
> So there's a separate superblock (and separate dentries) for each
> namespace?
>

Yes, you right.
So, here is a brief summary of what will be at the end:

1) PipeFS superblock will be per net ns.

2) Superblock holds net ns, which is taken from current. Struct net will have 
link ot pipefs superblock (it can be NULL, if PipeFS wasn't mounted yet or 
already unmounted).

3) Notifications will be send on superblock creation and destruction.

4) All kernel mount point creation and destruction calls (rpc_get_mount() and 
rpc_put_mount()) will be removed. I.e. this superblock will be created only from 
user-space.

5) Kernel pipes and dentries will be created or destroyed:
1. During per-net operations (only for static NFS stuff: dns_resolve cache, pnfs 
blocklayout and idmap pipes).
2. On notification events (all directories, files and pipes in proper 
callbacks). Notification subscribers:
a. rpc clients (responsible for client dentries and gss pipes creation),
b. nfs clients (responsible nfs idmap pipes),
c. nfs dns_resolve cache,
d. pnfs blocklayout pipes,

6) PipeFS dentries creation logic:
a) All directories and files creators - will try to create them as usual. But if 
fail - then no problem here. I.e. dentries will be created on PipeFS mount 
notification call.
b) Pipes creators - will create new structure rpc_pipe (all pipe stuff from rpc 
inode) nad then try to create pipe denties. If fail - then, again, no problem. 
Dentries will be be created on PipeFS mount notification call.


Almost all (exept 5.2.b - forgot about it during debasing and resending) is done 
and ready to send.

> What decides which clients are visible in which network namespaces?
>

Clients dentries will be created in proper superblock from the beginning. I.e. 
rpc clients transports have struct net reference. NFS clients will have such 
reference too. Struct net will have reference to pipefs superblock.
Currently, for dentry creation all we need is parent dentry. Some of creators 
(like GSS pipes) takes parent dentry from associated struct (like rpc_clnt). For 
others parent dentry can be found by simple d_lookup() starting from sb->root 
(reminder: sb can be taken from net).


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply

* cache forver in 3.2.0-rc2-00400-g866d43c ?
From: Arkadiusz Miśkiewicz @ 2011-11-23 18:10 UTC (permalink / raw)
  To: netdev


Hello,

I'm using  my notebook in two different networks, suspending and resuming from 
ram between.

Network A (192.168.1.0/24) has server with IP 87.204.99.133 in the same lan. 
Now when I suspend, go to totally different network B (different provider, 
different lan, 192.168.0.0/24) and resume then I'm unable to connect to 
87.204.99.133.

Looks like network stack thinks that 87...133 is still directly reachable on 
eth1and I'm unable to make it forget that.

[root@t400 ~]# ip ne flush dev eth1; ip r flush table cache
[root@t400 ~]# ip r show table cache to 87.204.99.133
[root@t400 ~]# ping -c 1 87.204.99.133
PING 87.204.99.133 (87.204.99.133) 56(84) bytes of data.
From 192.168.0.5: icmp_seq=1 Destination Host Unreachable

--- 87.204.99.133 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

zsh: exit 1     ping -c 1 87.204.99.133
[root@t400 ~]# ip r show table cache to 87.204.99.133
87.204.99.133 dev eth1  src 192.168.0.5
    cache <redirected>  ipid 0x2838 rtt 17ms rttvar 12ms cwnd 10
87.204.99.133 from 192.168.0.5 dev eth1
    cache <redirected>  ipid 0x2838 rtt 17ms rttvar 12ms cwnd 10
[root@t400 ~]# ip ne show to 87.204.99.133
87.204.99.133 dev eth1  FAILED

tcpdump in meantime sees this:
19:06:26.907153 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
87.204.99.133 tell 192.168.0.5, length 28
19:06:27.908379 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
87.204.99.133 tell 192.168.0.5, length 28
19:06:28.907084 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
87.204.99.133 tell 192.168.0.5, length 28
19:06:29.907145 IP (tos 0xc0, ttl 64, id 34465, offset 0, flags [none], proto 
ICMP (1), length 112)
    192.168.0.5 > 192.168.0.5: ICMP host 87.204.99.133 unreachable, length 92
        IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), 
length 84)
    192.168.0.5 > 87.204.99.133: ICMP echo request, id 17590, seq 1, length 64

Any ideas?

Thanks,
-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply

* Re: cache forver in 3.2.0-rc2-00400-g866d43c ?
From: Eric Dumazet @ 2011-11-23 18:22 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: netdev
In-Reply-To: <201111231910.37190.a.miskiewicz@gmail.com>

Le mercredi 23 novembre 2011 à 19:10 +0100, Arkadiusz Miśkiewicz a
écrit :
> Hello,
> 
> I'm using  my notebook in two different networks, suspending and resuming from 
> ram between.
> 
> Network A (192.168.1.0/24) has server with IP 87.204.99.133 in the same lan. 
> Now when I suspend, go to totally different network B (different provider, 
> different lan, 192.168.0.0/24) and resume then I'm unable to connect to 
> 87.204.99.133.
> 
> Looks like network stack thinks that 87...133 is still directly reachable on 
> eth1and I'm unable to make it forget that.
> 
> [root@t400 ~]# ip ne flush dev eth1; ip r flush table cache
> [root@t400 ~]# ip r show table cache to 87.204.99.133
> [root@t400 ~]# ping -c 1 87.204.99.133
> PING 87.204.99.133 (87.204.99.133) 56(84) bytes of data.
> From 192.168.0.5: icmp_seq=1 Destination Host Unreachable
> 
> --- 87.204.99.133 ping statistics ---
> 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
> 
> zsh: exit 1     ping -c 1 87.204.99.133
> [root@t400 ~]# ip r show table cache to 87.204.99.133
> 87.204.99.133 dev eth1  src 192.168.0.5
>     cache <redirected>  ipid 0x2838 rtt 17ms rttvar 12ms cwnd 10
> 87.204.99.133 from 192.168.0.5 dev eth1
>     cache <redirected>  ipid 0x2838 rtt 17ms rttvar 12ms cwnd 10
> [root@t400 ~]# ip ne show to 87.204.99.133
> 87.204.99.133 dev eth1  FAILED
> 
> tcpdump in meantime sees this:
> 19:06:26.907153 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
> 87.204.99.133 tell 192.168.0.5, length 28
> 19:06:27.908379 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
> 87.204.99.133 tell 192.168.0.5, length 28
> 19:06:28.907084 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
> 87.204.99.133 tell 192.168.0.5, length 28
> 19:06:29.907145 IP (tos 0xc0, ttl 64, id 34465, offset 0, flags [none], proto 
> ICMP (1), length 112)
>     192.168.0.5 > 192.168.0.5: ICMP host 87.204.99.133 unreachable, length 92
>         IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), 
> length 84)
>     192.168.0.5 > 87.204.99.133: ICMP echo request, id 17590, seq 1, length 64
> 
> Any ideas?
> 
> Thanks,

Hmm, could you check you have this fix in your tree ?

http://git2.kernel.org/?p=linux/kernel/git/davem/net.git;a=commit;h=9cc20b268a5a14f5e57b8ad405a83513ab0d78dc

^ permalink raw reply

* Re: cache forver in 3.2.0-rc2-00400-g866d43c ?
From: Arkadiusz Miśkiewicz @ 2011-11-23 18:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1322072541.2775.1.camel@edumazet-laptop>

On Wednesday 23 of November 2011, Eric Dumazet wrote:
> Le mercredi 23 novembre 2011 à 19:10 +0100, Arkadiusz Miśkiewicz a
> 
> écrit :
> > Hello,
> > 
> > I'm using  my notebook in two different networks, suspending and resuming
> > from ram between.
> > 
> > Network A (192.168.1.0/24) has server with IP 87.204.99.133 in the same
> > lan. Now when I suspend, go to totally different network B (different
> > provider, different lan, 192.168.0.0/24) and resume then I'm unable to
> > connect to 87.204.99.133.
> > 
> > Looks like network stack thinks that 87...133 is still directly reachable
> > on eth1and I'm unable to make it forget that.
> > 
> > [root@t400 ~]# ip ne flush dev eth1; ip r flush table cache
> > [root@t400 ~]# ip r show table cache to 87.204.99.133
> > [root@t400 ~]# ping -c 1 87.204.99.133
> > PING 87.204.99.133 (87.204.99.133) 56(84) bytes of data.
> > From 192.168.0.5: icmp_seq=1 Destination Host Unreachable
> > 
> > --- 87.204.99.133 ping statistics ---
> > 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
> > 
> > zsh: exit 1     ping -c 1 87.204.99.133
> > [root@t400 ~]# ip r show table cache to 87.204.99.133
> > 87.204.99.133 dev eth1  src 192.168.0.5
> > 
> >     cache <redirected>  ipid 0x2838 rtt 17ms rttvar 12ms cwnd 10
> > 
> > 87.204.99.133 from 192.168.0.5 dev eth1
> > 
> >     cache <redirected>  ipid 0x2838 rtt 17ms rttvar 12ms cwnd 10
> > 
> > [root@t400 ~]# ip ne show to 87.204.99.133
> > 87.204.99.133 dev eth1  FAILED
> > 
> > tcpdump in meantime sees this:
> > 19:06:26.907153 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
> > 87.204.99.133 tell 192.168.0.5, length 28
> > 19:06:27.908379 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
> > 87.204.99.133 tell 192.168.0.5, length 28
> > 19:06:28.907084 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
> > 87.204.99.133 tell 192.168.0.5, length 28
> > 19:06:29.907145 IP (tos 0xc0, ttl 64, id 34465, offset 0, flags [none],
> > proto ICMP (1), length 112)
> > 
> >     192.168.0.5 > 192.168.0.5: ICMP host 87.204.99.133 unreachable,
> >     length 92
> >     
> >         IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1),
> > 
> > length 84)
> > 
> >     192.168.0.5 > 87.204.99.133: ICMP echo request, id 17590, seq 1,
> >     length 64
> > 
> > Any ideas?
> > 
> > Thanks,
> 
> Hmm, could you check you have this fix in your tree ?
> 
> http://git2.kernel.org/?p=linux/kernel/git/davem/net.git;a=commit;h=9cc20b2
> 68a5a14f5e57b8ad405a83513ab0d78dc

Mine 00400-g866d43c was after 6fe4c6d466e95d31164f14b1ac4aefb51f0f4f82 (which 
is merge of ipv4: fix redirect handling), so I have it.

(I'm using pure linus git repo)
-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

^ permalink raw reply

* Re: slub: use irqsafe_cpu_cmpxchg for put_cpu_partial
From: Christian Kujau @ 2011-11-23 18:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Benjamin Herrenschmidt, Eric Dumazet,
	Markus Trippelsdorf, Alex,Shi, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Matt Mackall, netdev@vger.kernel.org,
	Tejun Heo, David Rientjes
In-Reply-To: <alpine.DEB.2.00.1111230907330.16139@router.home>

On Wed, 23 Nov 2011 at 09:14, Christoph Lameter wrote:
> I think he only tested the patch that he showed us.

Yes, that's the (only) one I tested so far. I did some overnight testing 
(rsync'ing to the external disk again) for 6hrs and ran "slabinfo" every 
30s during the run: http://nerdbynature.de/bits/3.2.0-rc1/oops/slabinfo-1.txt.xz

The machine is still up & running. So for me, your patch fixes it!

  Tested-by: Christian Kujau <lists@nerdbynature.de>

Thanks!
Christian.

> Subject: slub: use irqsafe_cpu_cmpxchg for put_cpu_partial
> 
> The cmpxchg must be irq safe. The fallback for this_cpu_cmpxchg only
> disables preemption which results in per cpu partial page operation
> potentially failing on non x86 platforms.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> ---
>  mm/slub.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c	2011-11-23 09:10:48.000000000 -0600
> +++ linux-2.6/mm/slub.c	2011-11-23 09:10:57.000000000 -0600
> @@ -1969,7 +1969,7 @@ int put_cpu_partial(struct kmem_cache *s
>  		page->pobjects = pobjects;
>  		page->next = oldpage;
> 
> -	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
> +	} while (irqsafe_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
>  	stat(s, CPU_PARTIAL_FREE);
>  	return pobjects;
>  }
> 

-- 
BOFH excuse #43:

boss forgot system password

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: cache forver in 3.2.0-rc2-00400-g866d43c ?
From: Eric Dumazet @ 2011-11-23 18:37 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: netdev
In-Reply-To: <201111231931.08596.a.miskiewicz@gmail.com>

Le mercredi 23 novembre 2011 à 19:31 +0100, Arkadiusz Miśkiewicz a
écrit :

> Mine 00400-g866d43c was after 6fe4c6d466e95d31164f14b1ac4aefb51f0f4f82 (which 
> is merge of ipv4: fix redirect handling), so I have it.
> 
> (I'm using pure linus git repo)

OK thanks for this information, I am working on a patch.

^ permalink raw reply

* [PATCH net-next] netxen: write IP address to firmware when using bonding
From: Andy Gospodarek @ 2011-11-23 18:43 UTC (permalink / raw)
  To: netdev; +Cc: Sony Chacko, Rajesh Borundia

The following patch was added to enable NX3031 devices to properly
aggregate frames for LRO:

	commit 6598b169b856793f8f9b80a3f3c5a48f5eaf40e3
	Author: Dhananjay Phadke <dhananjay@netxen.com>
	Date:   Sun Jul 26 20:07:37 2009 +0000

	    netxen: enable ip addr hashing

This patch is a followup to that fix as it allows LRO aggregation on
bonded devices that contain an NX3031 device.  This was tested on an
older distro and modified slightly to the latest upstream.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
---
 .../net/ethernet/qlogic/netxen/netxen_nic_main.c   |  119 +++++++++++++++-----
 1 files changed, 92 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
index 7dd9a4b..64eb618 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
@@ -3038,18 +3038,45 @@ recheck:
 		goto recheck;
 	}
 
-	if (!is_netxen_netdev(dev))
-		goto done;
+	/* If this is a bonding device, look for netxen-based slaves*/
+	if (dev->priv_flags & IFF_BONDING) {
+		struct net_device *slave;
 
-	adapter = netdev_priv(dev);
+		rcu_read_lock();
+		for_each_netdev_rcu(&init_net, slave) {
+			/* check to see if the device is in the bond */
+			if (slave->master == dev) {
 
-	if (!adapter)
-		goto done;
+				if (!is_netxen_netdev(slave))
+					continue;
 
-	if (adapter->is_up != NETXEN_ADAPTER_UP_MAGIC)
-		goto done;
+				adapter = netdev_priv(slave);
+
+				if (!adapter)
+					continue;
+
+				if (adapter->is_up != NETXEN_ADAPTER_UP_MAGIC)
+					continue;
+
+				netxen_config_indev_addr(adapter, orig_dev, event);
+			}
+		}
+		rcu_read_unlock();
+
+	} else {
+		if (!is_netxen_netdev(dev))
+			goto done;
+
+		adapter = netdev_priv(dev);
 
-	netxen_config_indev_addr(adapter, orig_dev, event);
+		if (!adapter)
+			goto done;
+
+		if (adapter->is_up != NETXEN_ADAPTER_UP_MAGIC)
+			goto done;
+
+		netxen_config_indev_addr(adapter, orig_dev, event);
+	}
 done:
 	return NOTIFY_DONE;
 }
@@ -3074,30 +3101,68 @@ recheck:
 		goto recheck;
 	}
 
-	if (!is_netxen_netdev(dev))
-		goto done;
+	/* If this is a bonding device, look for netxen-based slaves*/
+	if (dev->priv_flags & IFF_BONDING) {
+		struct net_device *slave;
 
-	adapter = netdev_priv(dev);
+		rcu_read_lock();
+		for_each_netdev_rcu(&init_net, slave) {
+			/* check to see if the device is in the bond */
+			if (slave->master == dev) {
 
-	if (!adapter || !netxen_destip_supported(adapter))
-		goto done;
+				if (!is_netxen_netdev(slave))
+					continue;
 
-	if (adapter->is_up != NETXEN_ADAPTER_UP_MAGIC)
-		goto done;
+				adapter = netdev_priv(slave);
 
-	switch (event) {
-	case NETDEV_UP:
-		netxen_config_ipaddr(adapter, ifa->ifa_address, NX_IP_UP);
-		netxen_list_config_vlan_ip(adapter, ifa, NX_IP_UP);
-		break;
-	case NETDEV_DOWN:
-		netxen_config_ipaddr(adapter, ifa->ifa_address, NX_IP_DOWN);
-		netxen_list_config_vlan_ip(adapter, ifa, NX_IP_DOWN);
-		break;
-	default:
-		break;
-	}
+				if (!adapter || !netxen_destip_supported(adapter))
+					continue;
+
+				if (adapter->is_up != NETXEN_ADAPTER_UP_MAGIC)
+					continue;
+
+				switch (event) {
+				case NETDEV_UP:
+					netxen_config_ipaddr(adapter, ifa->ifa_address, NX_IP_UP);
+					netxen_list_config_vlan_ip(adapter, ifa, NX_IP_UP);
+					break;
+				case NETDEV_DOWN:
+					netxen_config_ipaddr(adapter, ifa->ifa_address, NX_IP_DOWN);
+					netxen_list_config_vlan_ip(adapter, ifa, NX_IP_DOWN);
+					break;
+				default:
+					break;
+				}
+			}
+		}
+		rcu_read_unlock();
+
+	} else {
+
+		if (!is_netxen_netdev(dev))
+			goto done;
+
+		adapter = netdev_priv(dev);
+
+		if (!adapter || !netxen_destip_supported(adapter))
+			goto done;
 
+		if (adapter->is_up != NETXEN_ADAPTER_UP_MAGIC)
+			goto done;
+
+		switch (event) {
+		case NETDEV_UP:
+			netxen_config_ipaddr(adapter, ifa->ifa_address, NX_IP_UP);
+			netxen_list_config_vlan_ip(adapter, ifa, NX_IP_UP);
+			break;
+		case NETDEV_DOWN:
+			netxen_config_ipaddr(adapter, ifa->ifa_address, NX_IP_DOWN);
+			netxen_list_config_vlan_ip(adapter, ifa, NX_IP_DOWN);
+			break;
+		default:
+			break;
+		}
+	}
 done:
 	return NOTIFY_DONE;
 }
-- 
1.7.4.4

^ permalink raw reply related

* Re: [patch] netrom: check that user string is terminated
From: Ralf Baechle @ 2011-11-23 19:12 UTC (permalink / raw)
  To: walter harms
  Cc: Dan Carpenter, David S. Miller, linux-hams, netdev,
	kernel-janitors
In-Reply-To: <4ECCAD38.9090309@bfs.de>

On Wed, Nov 23, 2011 at 09:22:16AM +0100, walter harms wrote:

> I am not sure that it does what you intends.
> mnemonic is an array and a  malicious use may fill it upto the last char
> causing strlen go beyond. perhaps this may help:

Correct, it makes thigs worse.  I'm going to reply in detail later tonight,
have to bail out now.

> >  		if ((dev = nr_ax25_dev_get(nr_route.device)) == NULL)
> >  			return -EINVAL;
> while you are here:
> 
> 	dev = nr_ax25_dev_get(nr_route.device);
> 	if ( dev == NULL )
> 		return -EINVAL;
> 
> >  		if (nr_route.ndigis < 0 || nr_route.ndigis > AX25_MAX_DIGIS) {
> 
> if guess "nr_route.ndigis >= AX25_MAX_DIGIS" is intended ?

No, values of 0 .. AX25_MAX_DIGIS are permitted with zero meaning no
digipeater at all.

The actual bug if you want to call it that in this line is cosmetic -
nr_route.ndigis is an unsigned int so nr_route.ndigis < 0 will never become
true.  There are other simplifications possible to the error checking
here.  I've whipped a cleanup patch for this part of the code.

  Ralf

^ permalink raw reply

* header_ops and virtual devices
From: Arvid Brodin @ 2011-11-23 19:26 UTC (permalink / raw)
  To: netdev

Which device's header_ops->create() is called when several devices are
involved in the egress path of a frame? For example, with VLAN configured?
I'm having a hard time following the code...

-- 
Arvid Brodin
Enea Services Stockholm AB

^ permalink raw reply

* Re: cache forver in 3.2.0-rc2-00400-g866d43c ?
From: Eric Dumazet @ 2011-11-23 19:44 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: netdev
In-Reply-To: <1322073450.2775.2.camel@edumazet-laptop>

Le mercredi 23 novembre 2011 à 19:37 +0100, Eric Dumazet a écrit :
> Le mercredi 23 novembre 2011 à 19:31 +0100, Arkadiusz Miśkiewicz a
> écrit :
> 
> > Mine 00400-g866d43c was after 6fe4c6d466e95d31164f14b1ac4aefb51f0f4f82 (which 
> > is merge of ipv4: fix redirect handling), so I have it.
> > 
> > (I'm using pure linus git repo)
> 
> OK thanks for this information, I am working on a patch.
> 
> 

Please test the following patch, thanks !

 include/net/inetpeer.h |    1 +
 net/ipv4/route.c       |   10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/net/inetpeer.h b/include/net/inetpeer.h
index 78c83e6..e9ff3fc 100644
--- a/include/net/inetpeer.h
+++ b/include/net/inetpeer.h
@@ -35,6 +35,7 @@ struct inet_peer {
 
 	u32			metrics[RTAX_MAX];
 	u32			rate_tokens;	/* rate limiting for ICMP */
+	int			redirect_genid;
 	unsigned long		rate_last;
 	unsigned long		pmtu_expires;
 	u32			pmtu_orig;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 0c74da8..be8643da 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -131,6 +131,7 @@ static int ip_rt_mtu_expires __read_mostly	= 10 * 60 * HZ;
 static int ip_rt_min_pmtu __read_mostly		= 512 + 20 + 20;
 static int ip_rt_min_advmss __read_mostly	= 256;
 static int rt_chain_length_max __read_mostly	= 20;
+static int redirect_genid;
 
 /*
  *	Interface to generic destination cache.
@@ -837,6 +838,7 @@ static void rt_cache_invalidate(struct net *net)
 
 	get_random_bytes(&shuffle, sizeof(shuffle));
 	atomic_add(shuffle + 1U, &net->ipv4.rt_genid);
+	redirect_genid++;
 }
 
 /*
@@ -1391,8 +1393,10 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 
 				peer = rt->peer;
 				if (peer) {
-					if (peer->redirect_learned.a4 != new_gw) {
+					if (peer->redirect_learned.a4 != new_gw ||
+					    peer->redirect_genid != redirect_genid) {
 						peer->redirect_learned.a4 = new_gw;
+						peer->redirect_genid = redirect_genid;
 						atomic_inc(&__rt_peer_genid);
 					}
 					check_peer_redir(&rt->dst, peer);
@@ -1701,6 +1705,8 @@ static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
 		if (peer) {
 			check_peer_pmtu(dst, peer);
 
+			if (peer->redirect_genid != redirect_genid)
+				peer->redirect_learned.a4 = 0;
 			if (peer->redirect_learned.a4 &&
 			    peer->redirect_learned.a4 != rt->rt_gateway) {
 				if (check_peer_redir(dst, peer))
@@ -1852,6 +1858,8 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
 		dst_init_metrics(&rt->dst, peer->metrics, false);
 
 		check_peer_pmtu(&rt->dst, peer);
+		if (peer->redirect_genid != redirect_genid)
+			peer->redirect_learned.a4 = 0;
 		if (peer->redirect_learned.a4 &&
 		    peer->redirect_learned.a4 != rt->rt_gateway) {
 			rt->rt_gateway = peer->redirect_learned.a4;

^ permalink raw reply related

* RAW netfilter - "advanced netfilter setting" or not?
From: Linus Torvalds @ 2011-11-23 19:45 UTC (permalink / raw)
  To: David Miller, Pablo Neira Ayuso, Patrick McHardy; +Cc: netfilter-devel, netdev

So I'm the one who long ago asked for some of the more esoteric
netfilter configuration questions to be hidden behind some "advanced"
question, and thus the reason why a lot of them are behind that
NETFILTER_ADVANCED Kconfig setting.

However, I'm now trying OpenSUSE on one of my laptops, and it looks
like the RAW filter is used by the default OS iptables setup. The fact
that it is hidden behind NETFILTER_ADVANCED now means that I either
have to enable the advanced netfilter Kconfig questions, or we should
just remove the "depends on NETFILTER_ADVANCED" for the RAW case (or,
rather - caseS - since there's a separate raw filter for ipv4 and
ipv6, which sounds odd in itself, but that's another issue entirely)

My gut feel is that if it's one of the filters that a major distro
depends on by default, it should no longer be hidden. But honestly, I
didn't look at *why* OpenSUSE uses that filter. Maybe it's just doing
something really odd and crazy.

Comments?

                           Linus

^ permalink raw reply

* Re: [patch] netrom: check that user string is terminated
From: Dan Carpenter @ 2011-11-23 19:54 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: walter harms, David S. Miller, linux-hams, netdev,
	kernel-janitors
In-Reply-To: <20111123191249.GB7260@linux-mips.org>

[-- Attachment #1: Type: text/plain, Size: 577 bytes --]

On Wed, Nov 23, 2011 at 07:12:49PM +0000, Ralf Baechle wrote:
> On Wed, Nov 23, 2011 at 09:22:16AM +0100, walter harms wrote:
> 
> > I am not sure that it does what you intends.
> > mnemonic is an array and a  malicious use may fill it upto the last char
> > causing strlen go beyond. perhaps this may help:
> 
> Correct, it makes thigs worse.  I'm going to reply in detail later tonight,
> have to bail out now.
> 

Ok.  I said in a different thread that I was going to redo these
using strnlen() but I'll wait to read your comments.

regards,
dan carpenter


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: RAW netfilter - "advanced netfilter setting" or not?
From: Patrick McHardy @ 2011-11-23 19:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Miller, Pablo Neira Ayuso, netfilter-devel, netdev
In-Reply-To: <CA+55aFxtSgLppJQ8EbQEovdXsCNxQ1gT31FyP2cpyuvoBLajZw@mail.gmail.com>

On 23.11.2011 20:45, Linus Torvalds wrote:
> So I'm the one who long ago asked for some of the more esoteric
> netfilter configuration questions to be hidden behind some "advanced"
> question, and thus the reason why a lot of them are behind that
> NETFILTER_ADVANCED Kconfig setting.
> 
> However, I'm now trying OpenSUSE on one of my laptops, and it looks
> like the RAW filter is used by the default OS iptables setup. The fact
> that it is hidden behind NETFILTER_ADVANCED now means that I either
> have to enable the advanced netfilter Kconfig questions, or we should
> just remove the "depends on NETFILTER_ADVANCED" for the RAW case (or,
> rather - caseS - since there's a separate raw filter for ipv4 and
> ipv6, which sounds odd in itself, but that's another issue entirely)
> 
> My gut feel is that if it's one of the filters that a major distro
> depends on by default, it should no longer be hidden.

Agreed, the main point was to enable everything used by major
distributions by default (default m if NETFILTER_ADVANCED=n)
and hide everything else.

> But honestly, I
> didn't look at *why* OpenSUSE uses that filter. Maybe it's just doing
> something really odd and crazy.

Most likely they're using NOTRACK to avoid connection tracking for
some traffic. Could you post the output of "iptables -t raw -vxnL"?

^ permalink raw reply

* Re: RAW netfilter - "advanced netfilter setting" or not?
From: Linus Torvalds @ 2011-11-23 20:17 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, Pablo Neira Ayuso, netfilter-devel, netdev
In-Reply-To: <4ECD5069.4090106@trash.net>

On Wed, Nov 23, 2011 at 11:58 AM, Patrick McHardy <kaber@trash.net> wrote:
>
> Most likely they're using NOTRACK to avoid connection tracking for
> some traffic. Could you post the output of "iptables -t raw -vxnL"?

Hmm. That's actually empty for me. I only went by some error messages
during bootup. Or maybe I should boot the distro kernel to see that
there isn't something else I'm missing that makes the user setup
unhappy.

                     Linus

^ permalink raw reply

* Re: Finding a hidden bound TCP socket
From: G. D. Fuego @ 2011-11-23 20:27 UTC (permalink / raw)
  To: richard -rw- weinberger; +Cc: linux-kernel, netdev
In-Reply-To: <CAFLxGvynabpEcX2RwiLxatjOEzizj95jAVY6E4D9ymDpn8s9UA@mail.gmail.com>

Any comments?  The behavior seems broken.  At the very least its very
inconsistent with other Unixes.

On Tue, Nov 15, 2011 at 3:23 PM, richard -rw- weinberger
<richard.weinberger@gmail.com> wrote:
> On Tue, Nov 15, 2011 at 8:21 PM, G. D. Fuego <gdfuego@gmail.com> wrote:
>> Hello,
>>
>> I have a question about an odd linux networking behavior, that could
>> potentially be a local networking DoS.  I'm curious if anyone is
>> familiar with this behavior.
>>
>> Essentially I was assisting someone with tracking down a hidden tcp
>> connection.  Attempts to bind to a particular port were failing,
>> saying the socket was in use, even though netstat was not reporting
>> any sort of connection.  They were initially suspecting a root kit,
>> but after a bit of digging, I came across this page:
>>
>> http://dcid.me/2007/06/hidden-ports-on-linux/
>>
>> From the page:
>>
>> "Here is the idea. If you get this simple C program, it will attempt
>> to bind every TCP port from 1025 to 1050, but it will not listen on
>> them. After it is done, if you do a netstat (or fuser or lsof) nothing
>> will be shown. However, if you try to use the port, you will get an
>> error saying that it is already in use."
>>
>> I tested it out and confirmed that connections opened by their test
>> program do in fact cause the port to be unavailable for use, and they
>> are not reported in netstat, lsof, ss, or any other networking tools
>> that I tried.  I'm unable to to find any references to the ports being
>> in use anywhere I've looked within /proc.  Are you aware of another
>> way to figure out which process may be bound to the port?  In our
>> case, we figured out via trial and error which software was likely
>> triggering this behavior.
>>
>> It seems to me that this could be a potentially interesting local
>> networking DoS.  By binding to all ephemeral ports in this way, you'd
>> prevent the local system from being able to establish any tcp
>> connections, and it would be a pain to figure out which process was
>> causing the problem.
>>
>> My lame attempts to exploit this failed due to a max file descriptor
>> limit, but I'm told this could be doable by forking more processes for
>> doing the binding.
>>
>> Is this behavior known/expected?
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>
> CC'ing netdev
>
> --
> Thanks,
> //richard
>

^ permalink raw reply

* [ANNOUNCE] iproute2 v3.0.1
From: Stephen Hemminger @ 2011-11-23 20:29 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3474 bytes --]

This version of iproute2 utilities intended for use with 3.1.0 or
later kernel, but should be backward compatible with older releases.
It includes many bug fixes for error cases, support for new kernel features,
and manual enhancements.

Source: no longer at linux-foundation
  http://kernel.org/pub/linux/kernel/utils/networking/iproute2/iproute2-3.1.0.gz

Repository:
  git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

This version is signed with my GPG public key

Report problems (or enhancements) to the netdev@vger.kernel.org mailing list.

Changelog since last version v2.6.39

Andreas Henriksson (2):
      iproute2: Fix building xt module against xtables version 6
      iproute2: Remove "monitor" from "ip route help" output

Bin Li (1):
      ip: fix typo in ip link manual page

Christoph Biedl (1):
      ip: fix display of prefix cache info

Dan McGee (6):
      iproute2: Remove ChangeLog
      Make iproute2 configure script more flexible
      iptuntap: avoid double open
      genl: remove unused code
      ensure uptime is initialized if /proc/uptime cannot be opened
      xt: only unset fields if m is non NULL

David Ward (1):
      xfrm: Update documentation

Eric W. Biederman (3):
      iproute2: Add processless network namespace support
      iproute2: Auto-detect the presence of setns in libc
      iproute2: Fail "ip netns add" on existing network namespaces.

Florian Westphal (3):
      tc: man: add man page for choke scheduler
      tc: man: update sfq man page
      tc: filter: fix default 'protocol all' on little-endian platforms

Gilles Espinasse (1):
      iproute2: fix minor typo in comments

Jiri Benc (1):
      iproute2: fix changing of ip6ip6 tunnel parameters

Michal Soltys (2):
      HFSC (7) & (8) documentation + assorted changes
      HFS manpage changes

Mike Frysinger (1):
      tc: fix parallel build file with lex/yacc

Petr Sabata (4):
      iproute2: Remove unreachable code
      iproute2: ss - fix missing parameters
      iproute2: lnstat - fix typos
      iproute2: arpd - fix usage and manpage options

Petr Šabata (1):
      Display closed UDP sockets on 'ss -ul'

Sridhar Samudrala (1):
      iproute2: Fix usage and man page for 'ip link'

Stephen Hemminger (14):
      Update .gitignore
      Update kernel headers to 3.0
      Fix test for EOF on continuation line
      Remove redundant limits.h
      Add QFQ scheduler
      Add LLDP to ethernet type table
      Update headers to 3.0.4
      Add ntable to ip man page
      ip: fix man page warnings
      v3.0.0
      Update to 3.1-rc9 kernel headers
      ip: fix exit codes
      cleanup ematch yacc files
      v3.1.0

Thomas Jarosch (12):
      Fix pipe I/O stream descriptor leak in init_service_resolver()
      Fix file descriptor leak on error in rtnl_hash_initialize()
      Fix wrong comparison in cmp_print_eopt()
      Fix wrong sanity check in choke_parse_opt()
      Fix memory leak of lname variable in get_target_name()
      Add missing closedir() call in do_show()
      Fix file descriptor leak on error in iproute_flush_cache()
      Fix file descriptor leak on error in read_viftable()
      Fix file descriptor leak on error in read_mroute_list()
      Fix file descriptor leak in do_tunnels_list()
      Fix file descriptor leak on error in read_igmp()
      Fix unterminated readlink() buffer usage


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: Finding a hidden bound TCP socket
From: Eric Dumazet @ 2011-11-23 20:32 UTC (permalink / raw)
  To: G. D. Fuego; +Cc: richard -rw- weinberger, linux-kernel, netdev
In-Reply-To: <CALmbKGVUuZs+wigpkr0WPfn2pYP=tQh5-MsYWeRxbug=HskVQQ@mail.gmail.com>

Le mercredi 23 novembre 2011 à 15:27 -0500, G. D. Fuego a écrit :
> Any comments?  The behavior seems broken.  At the very least its very
> inconsistent with other Unixes.

Feel free to send a patch ;)

A user/process can consume all ports anyway, the 65000 port range is so
small...

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox