Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net-2.6 PATCH] ixgbevf: Fix link speed display
From: David Miller @ 2010-04-27 16:57 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, gregory.v.rose
In-Reply-To: <20100427103834.23338.22213.stgit@localhost.localdomain>

Not appropriate this late in the -RC series.

I don't see this in the regression list, and reported link speed being
incorrect is not a catastropic crash and/or failure.

^ permalink raw reply

* Re: [net-2.6 PATCH] ixgbe: cleanup ethtool autoneg input
From: David Miller @ 2010-04-27 16:56 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, donald.c.skidmore
In-Reply-To: <20100427112513.24196.70511.stgit@localhost.localdomain>

This is also not appropriate for net-2.6, it doesn't fix a
regression in the regression list and it doesn't fix a catastropic
crash or failure.

You really have to be kidding me if you thing a patch like this
is fine this late in the -RC series.

You also aren't even numbering your patches, which is quite a
transgression when a submission of a set of patches all to the same
driver and/or files.

^ permalink raw reply

* Re: [net-2.6 PATCH] ixgbe: Properly display 1 gig downshift warning for backplane
From: David Miller @ 2010-04-27 16:54 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, anjali.singhai
In-Reply-To: <20100427103724.23338.53872.stgit@localhost.localdomain>


Not fixing a regression, not fixing a catastropic error, just adding
a warning message.

Therefore not appropriate for net-2.6

^ permalink raw reply

* Re: [PATCH] bnx2x: add support for receive hashing
From: David Miller @ 2010-04-27 16:51 UTC (permalink / raw)
  To: bmb; +Cc: therbert, eric.dumazet, netdev, rick.jones2
In-Reply-To: <4BD6E887.3000804@athenacr.com>

From: Brian Bloniarz <bmb@athenacr.com>
Date: Tue, 27 Apr 2010 09:37:11 -0400

> David Miller wrote:
>> How damn hard is it to add two 16-bit ports to the hash regardless of
>> protocol?
>>   
> Come to think of it, for UDP the hash must ignore
> the srcport and srcaddr, because a single bound
> socket is going to wildcard both those fields.

For load distribution we don't care if the local socket is wildcard
bounded on source.

It's going to be fully specified in the packet, and that's enough.

Sure, for full RFS some amends might be necessary in this area, but
for RPS and adapter based hw steering, using all of the ports is
entirely sufficient.

^ permalink raw reply

* Re: linux-next: build failure after merge of the final tree (net tree related)
From: David Miller @ 2010-04-27 16:34 UTC (permalink / raw)
  To: sfr; +Cc: netdev, linux-next, linux-kernel, yoshfuji
In-Reply-To: <20100427152516.580ee620.sfr@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Tue, 27 Apr 2010 15:25:16 +1000

> After merging the bkl-ioctl tree, today's linux-next build (powerpc
> ppc44x_defconfig) failed like this:
> 
> net/bridge/br_multicast.c: In function 'br_ip6_multicast_alloc_query':
> net/bridge/br_multicast.c:469: error: implicit declaration of function 'csum_ipv6_magic'
> 
> Introduced by commit 08b202b6726459626c73ecfa08fcdc8c3efc76c2 ("bridge
> br_multicast: IPv6 MLD support") from the net tree.
> 
> csum_ipv6_magic is declared in net/ip6_checksum.h ...

Bummer, powerpc is one of the few platforms that doesn't get the header
file implicitly so you always trip over this whereas we never see it in
x86 and sparc64 builds :-)

I'll fix this, thanks!

^ permalink raw reply

* Re: [PATCH] RCU: don't turn off lockdep when find suspicious rcu_dereference_check() usage
From: Eric Dumazet @ 2010-04-27 16:33 UTC (permalink / raw)
  To: paulmck
  Cc: Eric W. Biederman, Miles Lane, Vivek Goyal, Eric Paris,
	Lai Jiangshan, Ingo Molnar, Peter Zijlstra, LKML, nauman, netdev,
	Jens Axboe, Gui Jianfeng, Li Zefan, Johannes Berg
In-Reply-To: <20100427162201.GA5826@linux.vnet.ibm.com>

Le mardi 27 avril 2010 à 09:22 -0700, Paul E. McKenney a écrit :
> On Mon, Apr 26, 2010 at 09:27:44PM -0700, Paul E. McKenney wrote:
> > On Mon, Apr 26, 2010 at 11:35:10AM -0700, Eric W. Biederman wrote:
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> > > 
> > > > Eric Dumazet traced these down to a commit from Eric Biederman.
> > > >
> > > > If I don't hear from Eric Biederman in a few days, I will attempt a
> > > > patch, but it would be more likely to be correct coming from someone
> > > > with a better understanding of the code.  ;-)
> > > 
> > > I already replied.
> > > 
> > > http://lkml.org/lkml/2010/4/21/420
> > 
> > You did indeed!!!  This experience is giving me an even better appreciation
> > of the maintainers' ability to keep all their patches straight!
> > 
> > I will put together something based on your suggestion.
> 
> How about the following?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 85fa42bd568ab99c375f018761ae6345249942cd
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Mon Apr 26 21:40:05 2010 -0700
> 
>     net: suppress RCU lockdep false positive in twsk_net()
>     
>     Calls to twsk_net() are in some cases protected by reference counting
>     as an alternative to RCU protection.  Cases covered by reference counts
>     include __inet_twsk_kill(), inet_twsk_free(), inet_twdr_do_twkill_work(),
>     inet_twdr_twcal_tick(), and tcp_timewait_state_process().  RCU is used
>     by inet_twsk_purge().  Locking is used by established_get_first()
>     and established_get_next().  Finally, __inet_twsk_hashdance() is an
>     initialization case.
>     
>     It appears to be non-trivial to locate the appropriate locks and
>     reference counts from within twsk_net(), so used rcu_dereference_raw().
>     
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
> index 79f67ea..a066fdd 100644
> --- a/include/net/inet_timewait_sock.h
> +++ b/include/net/inet_timewait_sock.h
> @@ -224,7 +224,9 @@ static inline
>  struct net *twsk_net(const struct inet_timewait_sock *twsk)
>  {
>  #ifdef CONFIG_NET_NS
> -	return rcu_dereference(twsk->tw_net);
> +	return rcu_dereference_raw(twsk->tw_net); /* protected by locking, */
> +						  /* reference counting, */
> +						  /* initialization, or RCU. */
>  #else
>  	return &init_net;
>  #endif
> --

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

^ permalink raw reply

* Re: [PATCH 0/3] [RFC] ptp: IEEE 1588 clock support
From: Wolfgang Grandegger @ 2010-04-27 16:20 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev
In-Reply-To: <4BD6E837.2040505@grandegger.com>

Wolfgang Grandegger wrote:
> Hi Richard,
> 
> Richard Cochran wrote:
>> BTW,
>>
>> To show how the API works from the user space, I also patches For the
>> popular ptpd program to that project.
>>
>>    https://sourceforge.net/tracker/?func=detail&aid=2992845&group_id=139814&atid=744634
> 
> Cool stuff, I'm giving it a try on my MPC8313 PTP setup.

Do you have also a patch adding support for hardware timestamping to ptpd?

Thanks,

Wolfgang.

^ permalink raw reply

* Re: [PATCH] RCU: don't turn off lockdep when find suspicious rcu_dereference_check() usage
From: Paul E. McKenney @ 2010-04-27 16:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	Peter Zijlstra, LKML, nauman, eric.dumazet, netdev, Jens Axboe,
	Gui Jianfeng, Li Zefan, Johannes Berg
In-Reply-To: <20100427042744.GD2510@linux.vnet.ibm.com>

On Mon, Apr 26, 2010 at 09:27:44PM -0700, Paul E. McKenney wrote:
> On Mon, Apr 26, 2010 at 11:35:10AM -0700, Eric W. Biederman wrote:
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> > 
> > > Eric Dumazet traced these down to a commit from Eric Biederman.
> > >
> > > If I don't hear from Eric Biederman in a few days, I will attempt a
> > > patch, but it would be more likely to be correct coming from someone
> > > with a better understanding of the code.  ;-)
> > 
> > I already replied.
> > 
> > http://lkml.org/lkml/2010/4/21/420
> 
> You did indeed!!!  This experience is giving me an even better appreciation
> of the maintainers' ability to keep all their patches straight!
> 
> I will put together something based on your suggestion.

How about the following?

							Thanx, Paul

------------------------------------------------------------------------

commit 85fa42bd568ab99c375f018761ae6345249942cd
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Apr 26 21:40:05 2010 -0700

    net: suppress RCU lockdep false positive in twsk_net()
    
    Calls to twsk_net() are in some cases protected by reference counting
    as an alternative to RCU protection.  Cases covered by reference counts
    include __inet_twsk_kill(), inet_twsk_free(), inet_twdr_do_twkill_work(),
    inet_twdr_twcal_tick(), and tcp_timewait_state_process().  RCU is used
    by inet_twsk_purge().  Locking is used by established_get_first()
    and established_get_next().  Finally, __inet_twsk_hashdance() is an
    initialization case.
    
    It appears to be non-trivial to locate the appropriate locks and
    reference counts from within twsk_net(), so used rcu_dereference_raw().
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index 79f67ea..a066fdd 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -224,7 +224,9 @@ static inline
 struct net *twsk_net(const struct inet_timewait_sock *twsk)
 {
 #ifdef CONFIG_NET_NS
-	return rcu_dereference(twsk->tw_net);
+	return rcu_dereference_raw(twsk->tw_net); /* protected by locking, */
+						  /* reference counting, */
+						  /* initialization, or RCU. */
 #else
 	return &init_net;
 #endif

^ permalink raw reply related

* Re: [PATCH net-next-2.6] net: fix a lockdep rcu warning in __sk_dst_set()
From: Paul E. McKenney @ 2010-04-27 16:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1272350443.4861.9.camel@edumazet-laptop>

On Tue, Apr 27, 2010 at 08:40:43AM +0200, Eric Dumazet wrote:
> __sk_dst_set() might be called while no state can be integrated in a
> rcu_dereference_check() condition.
> 
> So use rcu_dereference_raw() to shutup lockdep warnings (if
> CONFIG_PROVE_RCU is set)

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 86a8ca1..94dbdbc 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1236,8 +1236,11 @@ __sk_dst_set(struct sock *sk, struct dst_entry *dst)
>  	struct dst_entry *old_dst;
> 
>  	sk_tx_queue_clear(sk);
> -	old_dst = rcu_dereference_check(sk->sk_dst_cache,
> -					lockdep_is_held(&sk->sk_dst_lock));
> +	/*
> +	 * This can be called while sk is owned by the caller only,
> +	 * with no state that can be checked in a rcu_dereference_check() cond
> +	 */
> +	old_dst = rcu_dereference_raw(sk->sk_dst_cache);
>  	rcu_assign_pointer(sk->sk_dst_cache, dst);
>  	dst_release(old_dst);
>  }
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC: linux-next 1/2] irq: Add CPU mask affinity hint callback framework
From: Peter P Waskiewicz Jr @ 2010-04-27 16:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: davem@davemloft.net, arjan@linux.jf.intel.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <alpine.LFD.2.00.1004271425140.2951@localhost.localdomain>

On Tue, 27 Apr 2010, Thomas Gleixner wrote:

> On Sun, 18 Apr 2010, Peter P Waskiewicz Jr wrote:
>> +/**
>> + * struct irqaffinityhint - per interrupt affinity helper
>> + * @callback:	device driver callback function
>> + * @dev:	reference for the affected device
>> + * @irq:	interrupt number
>> + */
>> +struct irqaffinityhint {
>> +	irq_affinity_hint_t callback;
>> +	void *dev;
>> +	int irq;
>> +};
>
> Why do you need that extra data structure ? The device and the irq
> number are known, so all you need is the callback itself. So no need
> for allocating memory ....

When I register the function callback with the interrupt layer, I need to 
know what device structures to reference back in the driver.  In other 
words, if I call into an underlying driver with just an interrupt number, 
then I have no way at getting at the dev structures (netdevice for me, 
plus my private adapter structures), unless I declare them globally 
(yuck).

I had a different approach before this one where I assumed the device from 
the irq handler callback was safe to use for the device in this new 
callback.  I didn't feel really great about that, since it's an implicit 
assumption that could cause things to go sideways really quickly.

Let me know what you think either way.  I'm certainly willing to make a 
change, I just don't know at this point what's the safest approach from 
what I currently have.

>
>> +static ssize_t irq_affinity_hint_proc_write(struct file *file,
>> +		const char __user *buffer, size_t count, loff_t *pos)
>> +{
>> +	/* affinity_hint is read-only from proc */
>> +	return -EOPNOTSUPP;
>> +}
>> +
>
> Why do you want a write function when the file is read only ?

It's leftover paranoia.  I put it in early on, then changed the mode 
later.  I can remove this function.  I'll re-send something once we agree 
on how the code in your first comment should look.

Thanks Thomas!

-PJ

^ permalink raw reply

* Re: IPv6: race condition in __ipv6_ifa_notify() and dst_free() ?
From: Herbert Xu @ 2010-04-27 15:55 UTC (permalink / raw)
  To: Jiri Bohac; +Cc: David Miller, yoshfuji, netdev, shemminger
In-Reply-To: <20100427155034.GA11157@midget.suse.cz>

On Tue, Apr 27, 2010 at 05:50:34PM +0200, Jiri Bohac wrote:
>
>   I think that what Herbert is doing is only going to enforce that
>   the ip6_del_rt() is never going to fail, so the dst_free()
>   won't ever be called anyway, right?

Yes I think you're right.  But let's fix the bigger problem first,
and then we can audit this and possibly turn it into a WARN_ON.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: IPv6: race condition in __ipv6_ifa_notify() and dst_free() ?
From: Jiri Bohac @ 2010-04-27 15:50 UTC (permalink / raw)
  To: David Miller; +Cc: jbohac, herbert, yoshfuji, netdev, shemminger
In-Reply-To: <20100422.185400.71096585.davem@davemloft.net>

On Thu, Apr 22, 2010 at 06:54:00PM -0700, David Miller wrote:
> From: Jiri Bohac <jbohac@suse.cz>
> Date: Thu, 22 Apr 2010 17:49:08 +0200
> 
> > I still don't see why __ipv6_ifa_notify() needs to call
> > dst_free(). Shouldn't that be dst_release() instead, to drop the
> > reference obtained by dst_hold(&ifp->rt->u.dst)?
> 
> It likely wants to do both.
> 
> Just doing dst_release() doesn't mark the 'dst' object as obsolete,
> and therefore it won't get force garbage collected.

Sure. So If I understand it correctly, there are two problems:

- the reference taken by dst_hold() just above the ip6_del_rt()
  is never dropped if ip6_del_rt() fails; so shouldn't the code
  be like this?:

	dst_hold(&ifp->rt->u.dst);
	if (ip6_del_rt(ifp->rt)) {
		dst_release(&ifp->rt->u.dst);
		dst_free(&ifp->rt->u.dst);
	}

- if ip6_del_rt() fails because it races with something else
  deleting the address, dst_free() will be called twice. This is
  what Herbert is fixing with additional locking. However -- even
  when he fixes that -- how can ip6_del_rt() fail with the
  ifp->rt still needing a dst_free()?
  AFAICS, it can fail by:
  	- __ip6_del_rt() finding that (rt ==
	  net->ipv6.ip6_null_entry); we don't want to call
	  dst_free() on net->ipv6.ip6_null_entry, do we?

	- fib6_del() returning -ENOENT for multiple reasons;
	  but doesn't that mean that something else has removed
	  the route already and called dst_free on it?

  In either case, the dst_free() looks like not being needed. And it only
  does no harm in most cases, because these events are rare and
  it usually finds (obsolete > 2) and does nothing.

  I think that what Herbert is doing is only going to enforce that
  the ip6_del_rt() is never going to fail, so the dst_free()
  won't ever be called anyway, right?

Thanks,

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ

^ permalink raw reply

* Re: [PATCH] bnx2x: add support for receive hashing
From: Brian Bloniarz @ 2010-04-27 15:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, therbert, netdev, rick.jones2
In-Reply-To: <1272378659.2295.193.camel@edumazet-laptop>

Eric Dumazet wrote:
> Le mardi 27 avril 2010 à 09:37 -0400, Brian Bloniarz a écrit :
>> David Miller wrote:
>>> How damn hard is it to add two 16-bit ports to the hash regardless of
>>> protocol?
>>>   
>> Come to think of it, for UDP the hash must ignore
>> the srcport and srcaddr, because a single bound
>> socket is going to wildcard both those fields.
>>
> 
> For your application maybe ;)
> 
> Here, I have thousand of RTP flows to big mediagateways, so the
> (srcaddr, dstaddr) is shared by all these flows.
> 
> Tom Herbert also wants a threaded DNS server.
> 
> For UDP, we could have a bitmap (system level ?) to say if a particular
> destination port wants multi-cpu (RPS) spreading or not, even if the NIC
> decided to use a single queue to submit frames to the host (ie mask the
> src_addr and src_port). In this case, RFS on non conected UDP sockets
> could be activated as well.
> 
> Or just a global sysctl to be able to mask the src_addr and/or src_port
> in our software rxhash.

This is all good to know. Here are 3 other alternatives/suggestions:

1) Leave things as they are, if it really hurts that badly
for our workload we can just disable RPS entirely.
2) Add a global sysctl to disable RPS for datagram-based protocols.
3) Pluggable SW rxhashes :)

I haven't benchmarked anything for our workloads yet, so
I can't say for sure it's even a big issue.

Has anyone benchmarked RPS + single-threaded DNS servers? It'd
be a pessimization in that case, right? Even the multi-threaded
DNS server wouldn't be workable unless there was something like
the soreuseport patch you & Tom had been been discussing.

^ permalink raw reply

* Re: linux-next: manual merge of the tty tree with the net tree
From: Greg KH @ 2010-04-27 15:14 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: linux-next, linux-kernel, Sjur Braendeland, David Miller, netdev,
	Pavan Savoy
In-Reply-To: <20100427135706.259fa9c9.sfr@canb.auug.org.au>

On Tue, Apr 27, 2010 at 01:57:06PM +1000, Stephen Rothwell wrote:
> Hi Greg,
> 
> Today's linux-next merge of the tty tree got a conflict in
> include/linux/tty.h between commit
> 9b27105b4a44c54bf91ecd7d0315034ae75684f7 ("net-caif-driver: add CAIF
> serial driver (ldisc)") from the net tree and commit
> f3150d54db0e0caa1871f66b2424f463a34a7e7b ("serial: TTY: new ldiscs for
> staging") from the tty tree.
> 
> I fixed it up (they both changed NR_LDISCS - I used the higher number) and
> can carry the fix as necessary.

Thanks, the higher number should be what is needed here.

greg k-h

^ permalink raw reply

* Re: linux-next: manual merge of the staging-next tree with the net tree
From: Greg KH @ 2010-04-27 15:13 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: linux-next, linux-kernel, Jiri Pirko, David Miller, netdev,
	John Sheehan
In-Reply-To: <20100427142655.49673ff1.sfr@canb.auug.org.au>

On Tue, Apr 27, 2010 at 02:26:55PM +1000, Stephen Rothwell wrote:
> Hi Greg,
> 
> Today's linux-next merge of the staging-next tree got a conflict in
> drivers/staging/arlan/arlan-main.c between commit
> 22bedad3ce112d5ca1eaf043d4990fa2ed698c87 ("net: convert multicast list to
> list_head") from the net tree and commit
> 8db2022b08e232bb237b2162f03ff5f6e7c0c35e ("staging: arlan: fix errors
> reported by checkpatch.pl tool") from the staging-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary.

That looks fine, thanks.

greg k-h

^ permalink raw reply

* Re: [patch v2] sctp: cleanup: remove duplicate assignment
From: Vlad Yasevich @ 2010-04-27 14:32 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Sridhar Samudrala, David S. Miller, Wei Yongjun, Chris Dischino,
	linux-sctp, netdev, kernel-janitors
In-Reply-To: <20100424171805.GO29093@bicker>



Dan Carpenter wrote:
> This assignment isn't needed because we did it earlier already.
> 
> Also another reason to delete the assignment is because it triggers a
> Smatch warning about checking for NULL pointers after a dereference.
> 
> Reported-by: Vlad Yasevich <vladislav.yasevich@hp.com>
> Signed-off-by: Dan Carpenter <error27@gmail.com>

Thanks.  I'll take this one.

-vlad

> ---
> Thanks Vlad.  I came so close to seeing that myself if only I had openned
> my eyes a tiny bit more.  :P
> 
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 17cb400..33aed1c 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -419,10 +419,17 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
>  	if (!retval)
>  		goto nomem_chunk;
>  
> -	/* Per the advice in RFC 2960 6.4, send this reply to
> -	 * the source of the INIT packet.
> +	/* RFC 2960 6.4 Multi-homed SCTP Endpoints
> +	 *
> +	 * An endpoint SHOULD transmit reply chunks (e.g., SACK,
> +	 * HEARTBEAT ACK, * etc.) to the same destination transport
> +	 * address from which it received the DATA or control chunk
> +	 * to which it is replying.
> +	 *
> +	 * [INIT ACK back to where the INIT came from.]
>  	 */
>  	retval->transport = chunk->transport;
> +
>  	retval->subh.init_hdr =
>  		sctp_addto_chunk(retval, sizeof(initack), &initack);
>  	retval->param_hdr.v = sctp_addto_chunk(retval, addrs_len, addrs.v);
> @@ -461,18 +468,6 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
>  	/* We need to remove the const qualifier at this point.  */
>  	retval->asoc = (struct sctp_association *) asoc;
>  
> -	/* RFC 2960 6.4 Multi-homed SCTP Endpoints
> -	 *
> -	 * An endpoint SHOULD transmit reply chunks (e.g., SACK,
> -	 * HEARTBEAT ACK, * etc.) to the same destination transport
> -	 * address from which it received the DATA or control chunk
> -	 * to which it is replying.
> -	 *
> -	 * [INIT ACK back to where the INIT came from.]
> -	 */
> -	if (chunk)
> -		retval->transport = chunk->transport;
> -
>  nomem_chunk:
>  	kfree(cookie);
>  nomem_cookie:
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH] bnx2x: add support for receive hashing
From: Eric Dumazet @ 2010-04-27 14:30 UTC (permalink / raw)
  To: Brian Bloniarz; +Cc: David Miller, therbert, netdev, rick.jones2
In-Reply-To: <4BD6E887.3000804@athenacr.com>

Le mardi 27 avril 2010 à 09:37 -0400, Brian Bloniarz a écrit :
> David Miller wrote:
> > How damn hard is it to add two 16-bit ports to the hash regardless of
> > protocol?
> >   
> Come to think of it, for UDP the hash must ignore
> the srcport and srcaddr, because a single bound
> socket is going to wildcard both those fields.
> 

For your application maybe ;)

Here, I have thousand of RTP flows to big mediagateways, so the
(srcaddr, dstaddr) is shared by all these flows.

Tom Herbert also wants a threaded DNS server.

For UDP, we could have a bitmap (system level ?) to say if a particular
destination port wants multi-cpu (RPS) spreading or not, even if the NIC
decided to use a single queue to submit frames to the host (ie mask the
src_addr and src_port). In this case, RFS on non conected UDP sockets
could be activated as well.

Or just a global sysctl to be able to mask the src_addr and/or src_port
in our software rxhash.

> So the best we can hope for is for the hash to
> include destport and destaddr? From looking at
> my BCM5709's multiq behavior, I think broadcom's
> hash includes the destaddr but not the destport.

Toepliz hash for UDP is a hash(src_addr, dst_addr)

Both src port and dst port are ignored.

^ permalink raw reply

* Re: [net-2.6 PATCH] e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata
From: Matthew Garrett @ 2010-04-27 13:43 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, netdev, gospo, Bruce Allan
In-Reply-To: <20100427133232.25490.92973.stgit@localhost.localdomain>

On Tue, Apr 27, 2010 at 06:33:04AM -0700, Jeff Kirsher wrote:
> From: Bruce Allan <bruce.w.allan@intel.com>
> 
> Prompted by a previous patch submitted by Matthew Garret <mjg@redhat.com>,
> further digging into errata documentation reveals the current enabling or
> disabling of ASPM L0s and L1 states for certain parts supported by this
> driver are incorrect.  82571 and 82572 should always disable L1.  For
> standard frames, 82573/82574/82583 can enable L1 but L0s must be disabled,
> and for jumbo frames 82573/82574 must disable L1.  This allows for some
> parts to enable L1 in certain configurations leading to better power
> savings.
> 
> Also according to the same errata, Early Receive (ERT) should be disabled
> on 82573 when using jumbo frames.

Looks good. Thanks for digging into this!

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply

* Re: [PATCH 0/3] [RFC] ptp: IEEE 1588 clock support
From: Wolfgang Grandegger @ 2010-04-27 13:35 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev
In-Reply-To: <20100427102025.GA6471@riccoc20.at.omicron.at>

Hi Richard,

Richard Cochran wrote:
> BTW,
> 
> To show how the API works from the user space, I also patches For the
> popular ptpd program to that project.
> 
>    https://sourceforge.net/tracker/?func=detail&aid=2992845&group_id=139814&atid=744634

Cool stuff, I'm giving it a try on my MPC8313 PTP setup.

Thanks.

Wolfgang.

^ permalink raw reply

* Re: [PATCH] bnx2x: add support for receive hashing
From: Brian Bloniarz @ 2010-04-27 13:37 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, eric.dumazet, netdev, rick.jones2
In-Reply-To: <20100426.110432.104061817.davem@davemloft.net>

David Miller wrote:
> How damn hard is it to add two 16-bit ports to the hash regardless of
> protocol?
>   
Come to think of it, for UDP the hash must ignore
the srcport and srcaddr, because a single bound
socket is going to wildcard both those fields.

So the best we can hope for is for the hash to
include destport and destaddr? From looking at
my BCM5709's multiq behavior, I think broadcom's
hash includes the destaddr but not the destport.


^ permalink raw reply

* [net-2.6 PATCH] e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata
From: Jeff Kirsher @ 2010-04-27 13:33 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Matthew Garret, Bruce Allan, Jeff Kirsher

From: Bruce Allan <bruce.w.allan@intel.com>

Prompted by a previous patch submitted by Matthew Garret <mjg@redhat.com>,
further digging into errata documentation reveals the current enabling or
disabling of ASPM L0s and L1 states for certain parts supported by this
driver are incorrect.  82571 and 82572 should always disable L1.  For
standard frames, 82573/82574/82583 can enable L1 but L0s must be disabled,
and for jumbo frames 82573/82574 must disable L1.  This allows for some
parts to enable L1 in certain configurations leading to better power
savings.

Also according to the same errata, Early Receive (ERT) should be disabled
on 82573 when using jumbo frames.

Cc: Matthew Garret <mjg@redhat.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000e/82571.c  |   20 ++++++------
 drivers/net/e1000e/e1000.h  |    5 ++-
 drivers/net/e1000e/netdev.c |   70 ++++++++++++++++++++++++++-----------------
 3 files changed, 55 insertions(+), 40 deletions(-)

diff --git a/drivers/net/e1000e/82571.c b/drivers/net/e1000e/82571.c
index 712ccc6..9015555 100644
--- a/drivers/net/e1000e/82571.c
+++ b/drivers/net/e1000e/82571.c
@@ -336,7 +336,6 @@ static s32 e1000_get_variants_82571(struct e1000_adapter *adapter)
 	struct e1000_hw *hw = &adapter->hw;
 	static int global_quad_port_a; /* global port a indication */
 	struct pci_dev *pdev = adapter->pdev;
-	u16 eeprom_data = 0;
 	int is_port_b = er32(STATUS) & E1000_STATUS_FUNC_1;
 	s32 rc;
 
@@ -387,16 +386,15 @@ static s32 e1000_get_variants_82571(struct e1000_adapter *adapter)
 		if (pdev->device == E1000_DEV_ID_82571EB_SERDES_QUAD)
 			adapter->flags &= ~FLAG_HAS_WOL;
 		break;
-
 	case e1000_82573:
+	case e1000_82574:
+	case e1000_82583:
+		/* Disable ASPM L0s due to hardware errata */
+		e1000e_disable_aspm(adapter->pdev, PCIE_LINK_STATE_L0S);
+
 		if (pdev->device == E1000_DEV_ID_82573L) {
-			if (e1000_read_nvm(&adapter->hw, NVM_INIT_3GIO_3, 1,
-				       &eeprom_data) < 0)
-				break;
-			if (!(eeprom_data & NVM_WORD1A_ASPM_MASK)) {
-				adapter->flags |= FLAG_HAS_JUMBO_FRAMES;
-				adapter->max_hw_frame_size = DEFAULT_JUMBO;
-			}
+			adapter->flags |= FLAG_HAS_JUMBO_FRAMES;
+			adapter->max_hw_frame_size = DEFAULT_JUMBO;
 		}
 		break;
 	default:
@@ -1792,6 +1790,7 @@ struct e1000_info e1000_82571_info = {
 				  | FLAG_RESET_OVERWRITES_LAA /* errata */
 				  | FLAG_TARC_SPEED_MODE_BIT /* errata */
 				  | FLAG_APME_CHECK_PORT_B,
+	.flags2			= FLAG2_DISABLE_ASPM_L1, /* errata 13 */
 	.pba			= 38,
 	.max_hw_frame_size	= DEFAULT_JUMBO,
 	.get_variants		= e1000_get_variants_82571,
@@ -1809,6 +1808,7 @@ struct e1000_info e1000_82572_info = {
 				  | FLAG_RX_CSUM_ENABLED
 				  | FLAG_HAS_CTRLEXT_ON_LOAD
 				  | FLAG_TARC_SPEED_MODE_BIT, /* errata */
+	.flags2			= FLAG2_DISABLE_ASPM_L1, /* errata 13 */
 	.pba			= 38,
 	.max_hw_frame_size	= DEFAULT_JUMBO,
 	.get_variants		= e1000_get_variants_82571,
@@ -1820,13 +1820,11 @@ struct e1000_info e1000_82572_info = {
 struct e1000_info e1000_82573_info = {
 	.mac			= e1000_82573,
 	.flags			= FLAG_HAS_HW_VLAN_FILTER
-				  | FLAG_HAS_JUMBO_FRAMES
 				  | FLAG_HAS_WOL
 				  | FLAG_APME_IN_CTRL3
 				  | FLAG_RX_CSUM_ENABLED
 				  | FLAG_HAS_SMART_POWER_DOWN
 				  | FLAG_HAS_AMT
-				  | FLAG_HAS_ERT
 				  | FLAG_HAS_SWSM_ON_LOAD,
 	.pba			= 20,
 	.max_hw_frame_size	= ETH_FRAME_LEN + ETH_FCS_LEN,
diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index 118bdf4..ee32b9b 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -37,6 +37,7 @@
 #include <linux/io.h>
 #include <linux/netdevice.h>
 #include <linux/pci.h>
+#include <linux/pci-aspm.h>
 
 #include "hw.h"
 
@@ -374,7 +375,7 @@ struct e1000_adapter {
 struct e1000_info {
 	enum e1000_mac_type	mac;
 	unsigned int		flags;
-	unsigned int            flags2;
+	unsigned int		flags2;
 	u32			pba;
 	u32			max_hw_frame_size;
 	s32			(*get_variants)(struct e1000_adapter *);
@@ -421,6 +422,7 @@ struct e1000_info {
 #define FLAG2_CRC_STRIPPING               (1 << 0)
 #define FLAG2_HAS_PHY_WAKEUP              (1 << 1)
 #define FLAG2_IS_DISCARDING               (1 << 2)
+#define FLAG2_DISABLE_ASPM_L1             (1 << 3)
 
 #define E1000_RX_DESC_PS(R, i)	    \
 	(&(((union e1000_rx_desc_packet_split *)((R).desc))[i]))
@@ -461,6 +463,7 @@ extern void e1000e_update_stats(struct e1000_adapter *adapter);
 extern bool e1000e_has_link(struct e1000_adapter *adapter);
 extern void e1000e_set_interrupt_capability(struct e1000_adapter *adapter);
 extern void e1000e_reset_interrupt_capability(struct e1000_adapter *adapter);
+extern void e1000e_disable_aspm(struct pci_dev *pdev, u16 state);
 
 extern unsigned int copybreak;
 
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 73d43c5..fb8fc7d 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -4283,6 +4283,14 @@ static int e1000_change_mtu(struct net_device *netdev, int new_mtu)
 		return -EINVAL;
 	}
 
+	/* 82573 Errata 17 */
+	if (((adapter->hw.mac.type == e1000_82573) ||
+	     (adapter->hw.mac.type == e1000_82574)) &&
+	    (max_frame > ETH_FRAME_LEN + ETH_FCS_LEN)) {
+		adapter->flags2 |= FLAG2_DISABLE_ASPM_L1;
+		e1000e_disable_aspm(adapter->pdev, PCIE_LINK_STATE_L1);
+	}
+
 	while (test_and_set_bit(__E1000_RESETTING, &adapter->state))
 		msleep(1);
 	/* e1000e_down -> e1000e_reset dependent on max_frame_size & mtu */
@@ -4605,29 +4613,39 @@ static void e1000_complete_shutdown(struct pci_dev *pdev, bool sleep,
 	}
 }
 
-static void e1000e_disable_l1aspm(struct pci_dev *pdev)
+#ifdef CONFIG_PCIEASPM
+static void __e1000e_disable_aspm(struct pci_dev *pdev, u16 state)
+{
+	pci_disable_link_state(pdev, state);
+}
+#else
+static void __e1000e_disable_aspm(struct pci_dev *pdev, u16 state)
 {
 	int pos;
-	u16 val;
+	u16 reg16;
 
 	/*
-	 * 82573 workaround - disable L1 ASPM on mobile chipsets
-	 *
-	 * L1 ASPM on various mobile (ich7) chipsets do not behave properly
-	 * resulting in lost data or garbage information on the pci-e link
-	 * level. This could result in (false) bad EEPROM checksum errors,
-	 * long ping times (up to 2s) or even a system freeze/hang.
-	 *
-	 * Unfortunately this feature saves about 1W power consumption when
-	 * active.
+	 * Both device and parent should have the same ASPM setting.
+	 * Disable ASPM in downstream component first and then upstream.
 	 */
-	pos = pci_find_capability(pdev, PCI_CAP_ID_EXP);
-	pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, &val);
-	if (val & 0x2) {
-		dev_warn(&pdev->dev, "Disabling L1 ASPM\n");
-		val &= ~0x2;
-		pci_write_config_word(pdev, pos + PCI_EXP_LNKCTL, val);
-	}
+	pos = pci_pcie_cap(pdev);
+	pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, &reg16);
+	reg16 &= ~state;
+	pci_write_config_word(pdev, pos + PCI_EXP_LNKCTL, reg16);
+
+	pos = pci_pcie_cap(pdev->bus->self);
+	pci_read_config_word(pdev->bus->self, pos + PCI_EXP_LNKCTL, &reg16);
+	reg16 &= ~state;
+	pci_write_config_word(pdev->bus->self, pos + PCI_EXP_LNKCTL, reg16);
+}
+#endif
+void e1000e_disable_aspm(struct pci_dev *pdev, u16 state)
+{
+	dev_info(&pdev->dev, "Disabling ASPM %s %s\n",
+		 (state & PCIE_LINK_STATE_L0S) ? "L0s" : "",
+		 (state & PCIE_LINK_STATE_L1) ? "L1" : "");
+
+	__e1000e_disable_aspm(pdev, state);
 }
 
 #ifdef CONFIG_PM
@@ -4653,7 +4671,8 @@ static int e1000_resume(struct pci_dev *pdev)
 	pci_set_power_state(pdev, PCI_D0);
 	pci_restore_state(pdev);
 	pci_save_state(pdev);
-	e1000e_disable_l1aspm(pdev);
+	if (adapter->flags2 & FLAG2_DISABLE_ASPM_L1)
+		e1000e_disable_aspm(pdev, PCIE_LINK_STATE_L1);
 
 	err = pci_enable_device_mem(pdev);
 	if (err) {
@@ -4795,7 +4814,8 @@ static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev)
 	int err;
 	pci_ers_result_t result;
 
-	e1000e_disable_l1aspm(pdev);
+	if (adapter->flags2 & FLAG2_DISABLE_ASPM_L1)
+		e1000e_disable_aspm(pdev, PCIE_LINK_STATE_L1);
 	err = pci_enable_device_mem(pdev);
 	if (err) {
 		dev_err(&pdev->dev,
@@ -4889,13 +4909,6 @@ static void e1000_eeprom_checks(struct e1000_adapter *adapter)
 		dev_warn(&adapter->pdev->dev,
 			 "Warning: detected DSPD enabled in EEPROM\n");
 	}
-
-	ret_val = e1000_read_nvm(hw, NVM_INIT_3GIO_3, 1, &buf);
-	if (!ret_val && (le16_to_cpu(buf) & (3 << 2))) {
-		/* ASPM enable */
-		dev_warn(&adapter->pdev->dev,
-			 "Warning: detected ASPM enabled in EEPROM\n");
-	}
 }
 
 static const struct net_device_ops e1000e_netdev_ops = {
@@ -4944,7 +4957,8 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
 	u16 eeprom_data = 0;
 	u16 eeprom_apme_mask = E1000_EEPROM_APME;
 
-	e1000e_disable_l1aspm(pdev);
+	if (ei->flags2 & FLAG2_DISABLE_ASPM_L1)
+		e1000e_disable_aspm(pdev, PCIE_LINK_STATE_L1);
 
 	err = pci_enable_device_mem(pdev);
 	if (err)


^ permalink raw reply related

* [PATCH 3/3] net: ipmr: add support for dumping routing tables over netlink
From: Patrick, McHardy, kaber @ 2010-04-27 13:26 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1272374785-3858-1-git-send-email-kaber@trash.net>

From: Patrick McHardy <kaber@trash.net>

The ipmr /proc interface (ip_mr_cache) can't be extended to dump routes
from any tables but the main table in a backwards compatible fashion since
the output format ends in a variable amount of output interfaces.

Introduce a new netlink interface to dump multicast routes from all tables,
similar to the netlink interface for regular routes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 net/ipv4/ipmr.c |   96 +++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 41e8fc0..eddfd12 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -128,8 +128,8 @@ static int ip_mr_forward(struct net *net, struct mr_table *mrt,
 			 int local);
 static int ipmr_cache_report(struct mr_table *mrt,
 			     struct sk_buff *pkt, vifi_t vifi, int assert);
-static int ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
-			    struct mfc_cache *c, struct rtmsg *rtm);
+static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
+			      struct mfc_cache *c, struct rtmsg *rtm);
 static void ipmr_expire_process(unsigned long arg);
 
 #ifdef CONFIG_IP_MROUTE_MULTIPLE_TABLES
@@ -831,7 +831,7 @@ static void ipmr_cache_resolve(struct net *net, struct mr_table *mrt,
 		if (ip_hdr(skb)->version == 0) {
 			struct nlmsghdr *nlh = (struct nlmsghdr *)skb_pull(skb, sizeof(struct iphdr));
 
-			if (ipmr_fill_mroute(mrt, skb, c, NLMSG_DATA(nlh)) > 0) {
+			if (__ipmr_fill_mroute(mrt, skb, c, NLMSG_DATA(nlh)) > 0) {
 				nlh->nlmsg_len = (skb_tail_pointer(skb) -
 						  (u8 *)nlh);
 			} else {
@@ -1904,9 +1904,8 @@ drop:
 }
 #endif
 
-static int
-ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb, struct mfc_cache *c,
-		 struct rtmsg *rtm)
+static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
+			      struct mfc_cache *c, struct rtmsg *rtm)
 {
 	int ct;
 	struct rtnexthop *nhp;
@@ -1994,11 +1993,93 @@ int ipmr_get_route(struct net *net,
 
 	if (!nowait && (rtm->rtm_flags&RTM_F_NOTIFY))
 		cache->mfc_flags |= MFC_NOTIFY;
-	err = ipmr_fill_mroute(mrt, skb, cache, rtm);
+	err = __ipmr_fill_mroute(mrt, skb, cache, rtm);
 	read_unlock(&mrt_lock);
 	return err;
 }
 
+static int ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
+			    u32 pid, u32 seq, struct mfc_cache *c)
+{
+	struct nlmsghdr *nlh;
+	struct rtmsg *rtm;
+
+	nlh = nlmsg_put(skb, pid, seq, RTM_NEWROUTE, sizeof(*rtm), NLM_F_MULTI);
+	if (nlh == NULL)
+		return -EMSGSIZE;
+
+	rtm = nlmsg_data(nlh);
+	rtm->rtm_family   = RTNL_FAMILY_IPMR;
+	rtm->rtm_dst_len  = 32;
+	rtm->rtm_src_len  = 32;
+	rtm->rtm_tos      = 0;
+	rtm->rtm_table    = mrt->id;
+	NLA_PUT_U32(skb, RTA_TABLE, mrt->id);
+	rtm->rtm_type     = RTN_MULTICAST;
+	rtm->rtm_scope    = RT_SCOPE_UNIVERSE;
+	rtm->rtm_protocol = RTPROT_UNSPEC;
+	rtm->rtm_flags    = 0;
+
+	NLA_PUT_BE32(skb, RTA_SRC, c->mfc_origin);
+	NLA_PUT_BE32(skb, RTA_DST, c->mfc_mcastgrp);
+
+	if (__ipmr_fill_mroute(mrt, skb, c, rtm) < 0)
+		goto nla_put_failure;
+
+	return nlmsg_end(skb, nlh);
+
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
+static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct net *net = sock_net(skb->sk);
+	struct mr_table *mrt;
+	struct mfc_cache *mfc;
+	unsigned int t = 0, s_t;
+	unsigned int h = 0, s_h;
+	unsigned int e = 0, s_e;
+
+	s_t = cb->args[0];
+	s_h = cb->args[1];
+	s_e = cb->args[2];
+
+	read_lock(&mrt_lock);
+	ipmr_for_each_table(mrt, net) {
+		if (t < s_t)
+			goto next_table;
+		if (t > s_t)
+			s_h = 0;
+		for (h = s_h; h < MFC_LINES; h++) {
+			list_for_each_entry(mfc, &mrt->mfc_cache_array[h], list) {
+				if (e < s_e)
+					goto next_entry;
+				if (ipmr_fill_mroute(mrt, skb,
+						     NETLINK_CB(cb->skb).pid,
+						     cb->nlh->nlmsg_seq,
+						     mfc) < 0)
+					goto done;
+next_entry:
+				e++;
+			}
+			e = s_e = 0;
+		}
+		s_h = 0;
+next_table:
+		t++;
+	}
+done:
+	read_unlock(&mrt_lock);
+
+	cb->args[2] = e;
+	cb->args[1] = h;
+	cb->args[0] = t;
+
+	return skb->len;
+}
+
 #ifdef CONFIG_PROC_FS
 /*
  *	The /proc interfaces to multicast routing /proc/ip_mr_cache /proc/ip_mr_vif
@@ -2355,6 +2436,7 @@ int __init ip_mr_init(void)
 		goto add_proto_fail;
 	}
 #endif
+	rtnl_register(RTNL_FAMILY_IPMR, RTM_GETROUTE, NULL, ipmr_rtm_dumproute);
 	return 0;
 
 #ifdef CONFIG_IP_PIMSM_V2
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 1/3] net: fib_rules: mark arguments to fib_rules_register const and __net_initdata
From: Patrick, McHardy, kaber @ 2010-04-27 13:26 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1272374785-3858-1-git-send-email-kaber@trash.net>

From: Patrick McHardy <kaber@trash.net>

fib_rules_register() duplicates the template passed to it without modification,
mark the argument as const. Additionally the templates are only needed when
instantiating a new namespace, so mark them as __net_initdata, which means
they can be discarded when CONFIG_NET_NS=n.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/net/fib_rules.h |    2 +-
 net/core/fib_rules.c    |    2 +-
 net/decnet/dn_rules.c   |    2 +-
 net/ipv4/fib_rules.c    |    2 +-
 net/ipv4/ipmr.c         |    2 +-
 net/ipv6/fib6_rules.c   |    2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 52bd9e6..e8923bc 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -104,7 +104,7 @@ static inline u32 frh_get_table(struct fib_rule_hdr *frh, struct nlattr **nla)
 	return frh->table;
 }
 
-extern struct fib_rules_ops *fib_rules_register(struct fib_rules_ops *, struct net *);
+extern struct fib_rules_ops *fib_rules_register(const struct fib_rules_ops *, struct net *);
 extern void fib_rules_unregister(struct fib_rules_ops *);
 extern void                     fib_rules_cleanup_ops(struct fib_rules_ops *);
 
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 1bc6659..42e84e0 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -122,7 +122,7 @@ errout:
 }
 
 struct fib_rules_ops *
-fib_rules_register(struct fib_rules_ops *tmpl, struct net *net)
+fib_rules_register(const struct fib_rules_ops *tmpl, struct net *net)
 {
 	struct fib_rules_ops *ops;
 	int err;
diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index af28dcc..1226bca 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -216,7 +216,7 @@ static void dn_fib_rule_flush_cache(struct fib_rules_ops *ops)
 	dn_rt_cache_flush(-1);
 }
 
-static struct fib_rules_ops dn_fib_rules_ops_template = {
+static const struct fib_rules_ops __net_initdata dn_fib_rules_ops_template = {
 	.family		= FIB_RULES_DECNET,
 	.rule_size	= sizeof(struct dn_fib_rule),
 	.addr_size	= sizeof(u16),
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 3ec84fe..8ab62a5 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -245,7 +245,7 @@ static void fib4_rule_flush_cache(struct fib_rules_ops *ops)
 	rt_cache_flush(ops->fro_net, -1);
 }
 
-static struct fib_rules_ops fib4_rules_ops_template = {
+static const struct fib_rules_ops __net_initdata fib4_rules_ops_template = {
 	.family		= FIB_RULES_IPV4,
 	.rule_size	= sizeof(struct fib4_rule),
 	.addr_size	= sizeof(u32),
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index a2df501..7d3e382 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -216,7 +216,7 @@ static int ipmr_rule_fill(struct fib_rule *rule, struct sk_buff *skb,
 	return 0;
 }
 
-static struct fib_rules_ops ipmr_rules_ops_template = {
+static const struct fib_rules_ops __net_initdata ipmr_rules_ops_template = {
 	.family		= FIB_RULES_IPMR,
 	.rule_size	= sizeof(struct ipmr_rule),
 	.addr_size	= sizeof(u32),
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index 8124f16..35f6949 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -237,7 +237,7 @@ static size_t fib6_rule_nlmsg_payload(struct fib_rule *rule)
 	       + nla_total_size(16); /* src */
 }
 
-static struct fib_rules_ops fib6_rules_ops_template = {
+static const struct fib_rules_ops __net_initdata fib6_rules_ops_template = {
 	.family			= FIB_RULES_IPV6,
 	.rule_size		= sizeof(struct fib6_rule),
 	.addr_size		= sizeof(struct in6_addr),
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 2/3] net: rtnetlink: decouple rtnetlink address families from real address families
From: Patrick, McHardy, kaber @ 2010-04-27 13:26 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1272374785-3858-1-git-send-email-kaber@trash.net>

From: Patrick McHardy <kaber@trash.net>

Decouple rtnetlink address families from real address families in socket.h to
be able to add rtnetlink interfaces to code that is not a real address family
without increasing AF_MAX/NPROTO.

This will be used to add support for multicast route dumping from all tables
as the proc interface can't be extended to support anything but the main table
without breaking compatibility.

This partialy undoes the patch to introduce independant families for routing
rules and converts ipmr routing rules to a new rtnetlink family. Similar to
that patch, values up to 127 are reserved for real address families, values
above that may be used arbitrarily.

Signed-off-by: Patrick McHardy <kaber@trash.net>
---
 include/linux/fib_rules.h |    8 --------
 include/linux/rtnetlink.h |    6 ++++++
 net/core/rtnetlink.c      |   14 +++++++-------
 net/decnet/dn_rules.c     |    2 +-
 net/ipv4/fib_rules.c      |    2 +-
 net/ipv4/ipmr.c           |    2 +-
 net/ipv6/fib6_rules.c     |    2 +-
 7 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/include/linux/fib_rules.h b/include/linux/fib_rules.h
index 04a3976..51da65b 100644
--- a/include/linux/fib_rules.h
+++ b/include/linux/fib_rules.h
@@ -15,14 +15,6 @@
 /* try to find source address in routing lookups */
 #define FIB_RULE_FIND_SADDR	0x00010000
 
-/* fib_rules families. values up to 127 are reserved for real address
- * families, values above 128 may be used arbitrarily.
- */
-#define FIB_RULES_IPV4		AF_INET
-#define FIB_RULES_IPV6		AF_INET6
-#define FIB_RULES_DECNET	AF_DECnet
-#define FIB_RULES_IPMR		128
-
 struct fib_rule_hdr {
 	__u8		family;
 	__u8		dst_len;
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index d1c7c90..5a42c36 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -7,6 +7,12 @@
 #include <linux/if_addr.h>
 #include <linux/neighbour.h>
 
+/* rtnetlink families. Values up to 127 are reserved for real address
+ * families, values above 128 may be used arbitrarily.
+ */
+#define RTNL_FAMILY_IPMR		128
+#define RTNL_FAMILY_MAX			128
+
 /****
  *		Routing/neighbour discovery messages.
  ****/
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 78c8598..fd781b6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -98,7 +98,7 @@ int lockdep_rtnl_is_held(void)
 EXPORT_SYMBOL(lockdep_rtnl_is_held);
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
 
-static struct rtnl_link *rtnl_msg_handlers[NPROTO];
+static struct rtnl_link *rtnl_msg_handlers[RTNL_FAMILY_MAX + 1];
 
 static inline int rtm_msgindex(int msgtype)
 {
@@ -118,7 +118,7 @@ static rtnl_doit_func rtnl_get_doit(int protocol, int msgindex)
 {
 	struct rtnl_link *tab;
 
-	if (protocol < NPROTO)
+	if (protocol <= RTNL_FAMILY_MAX)
 		tab = rtnl_msg_handlers[protocol];
 	else
 		tab = NULL;
@@ -133,7 +133,7 @@ static rtnl_dumpit_func rtnl_get_dumpit(int protocol, int msgindex)
 {
 	struct rtnl_link *tab;
 
-	if (protocol < NPROTO)
+	if (protocol <= RTNL_FAMILY_MAX)
 		tab = rtnl_msg_handlers[protocol];
 	else
 		tab = NULL;
@@ -167,7 +167,7 @@ int __rtnl_register(int protocol, int msgtype,
 	struct rtnl_link *tab;
 	int msgindex;
 
-	BUG_ON(protocol < 0 || protocol >= NPROTO);
+	BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX);
 	msgindex = rtm_msgindex(msgtype);
 
 	tab = rtnl_msg_handlers[protocol];
@@ -219,7 +219,7 @@ int rtnl_unregister(int protocol, int msgtype)
 {
 	int msgindex;
 
-	BUG_ON(protocol < 0 || protocol >= NPROTO);
+	BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX);
 	msgindex = rtm_msgindex(msgtype);
 
 	if (rtnl_msg_handlers[protocol] == NULL)
@@ -241,7 +241,7 @@ EXPORT_SYMBOL_GPL(rtnl_unregister);
  */
 void rtnl_unregister_all(int protocol)
 {
-	BUG_ON(protocol < 0 || protocol >= NPROTO);
+	BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX);
 
 	kfree(rtnl_msg_handlers[protocol]);
 	rtnl_msg_handlers[protocol] = NULL;
@@ -1384,7 +1384,7 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 
 	if (s_idx == 0)
 		s_idx = 1;
-	for (idx = 1; idx < NPROTO; idx++) {
+	for (idx = 1; idx <= RTNL_FAMILY_MAX; idx++) {
 		int type = cb->nlh->nlmsg_type-RTM_BASE;
 		if (idx < s_idx || idx == PF_PACKET)
 			continue;
diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index 1226bca..48fdf10 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -217,7 +217,7 @@ static void dn_fib_rule_flush_cache(struct fib_rules_ops *ops)
 }
 
 static const struct fib_rules_ops __net_initdata dn_fib_rules_ops_template = {
-	.family		= FIB_RULES_DECNET,
+	.family		= AF_DECnet,
 	.rule_size	= sizeof(struct dn_fib_rule),
 	.addr_size	= sizeof(u16),
 	.action		= dn_fib_rule_action,
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 8ab62a5..76daeb5 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -246,7 +246,7 @@ static void fib4_rule_flush_cache(struct fib_rules_ops *ops)
 }
 
 static const struct fib_rules_ops __net_initdata fib4_rules_ops_template = {
-	.family		= FIB_RULES_IPV4,
+	.family		= AF_INET,
 	.rule_size	= sizeof(struct fib4_rule),
 	.addr_size	= sizeof(u32),
 	.action		= fib4_rule_action,
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 7d3e382..41e8fc0 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -217,7 +217,7 @@ static int ipmr_rule_fill(struct fib_rule *rule, struct sk_buff *skb,
 }
 
 static const struct fib_rules_ops __net_initdata ipmr_rules_ops_template = {
-	.family		= FIB_RULES_IPMR,
+	.family		= RTNL_FAMILY_IPMR,
 	.rule_size	= sizeof(struct ipmr_rule),
 	.addr_size	= sizeof(u32),
 	.action		= ipmr_rule_action,
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index 35f6949..8e44f8f 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -238,7 +238,7 @@ static size_t fib6_rule_nlmsg_payload(struct fib_rule *rule)
 }
 
 static const struct fib_rules_ops __net_initdata fib6_rules_ops_template = {
-	.family			= FIB_RULES_IPV6,
+	.family			= AF_INET6,
 	.rule_size		= sizeof(struct fib6_rule),
 	.addr_size		= sizeof(struct in6_addr),
 	.action			= fib6_rule_action,
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 0/4] net: ipmr netlink interface for route dumping
From: Patrick, McHardy, kaber @ 2010-04-27 13:26 UTC (permalink / raw)
  To: davem; +Cc: netdev

These patches add support to ipmr to dump routes of all tables over
netlink. The /proc interface contains an unterminated list of oifs
at the end of a line, therefore it can't be extended to print the
routing table number without breaking backwards compatibility.

These patches fully decouple rtnetlink families from real address
families and add a new family RTNL_FAMILY_IPMR for ipmr as well as
a RTM_GETROUTE handler.

The first patch additionally marks the fib_rules_ops templates as
const __net_initdata. This is not directly related to the netlink
interface, but was already queued in my tree.

Please apply or pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6.git master

Thanks!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox