Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 0/3] sh_eth: probe/remove code cleanup/improvement
From: Sergei Shtylyov @ 2013-03-21 20:36 UTC (permalink / raw)
  To: netdev; +Cc: nobuhiro.iwamatsu.yj

Hello.

   Here is the series of 3 patches doing some cleanup and improvements to the
driver's probe/remove code paths.

[1/3] sh_eth: use PIR_* bits
[2/3] sh_eth: kill unneeded typecast in sh_eth_drv_probe()
[3/3] sh_eth: use managed device API

   All patches are against the recent David Miller's net-next.git to which he
has just helpfully merged my 2 recent fixes which are the base of the cleanups.

WBR, Sergei

^ permalink raw reply

* Re: regression: tethering fails in 3.5 with iwlwifi
From: Johannes Berg @ 2013-03-21 19:34 UTC (permalink / raw)
  To: artem.bityutskiy, patrik.flykt
  Cc: Eric Dumazet, linux-wireless, netdev, stable
In-Reply-To: <1347892887.7112.9.camel@sauron.fi.intel.com>

On Mon, 2012-09-17 at 17:41 +0300, Artem Bityutskiy wrote:

> OK, finally I got it. After 3 days of hardcore intelligent bisecting
>         I've found out that tethering in 3.5 works for me if I revert
> these 2
>         patches:
>         
>             56138f5 iwlwifi: dont pull too much payload in skb head

I got back to this for a customer running 3.5, and after many failed
attempts realized that you have to have iptables for this problem to
actually happen. I reverse-bisected that the *fix* is
6caab7b0544e83e6c160b5e80f5a4a7dd69545c7, in 3.7.

Is there still any stable kernel 3.5/3.6 (or possibly before, though for
iwlwifi before doesn't matter) that this should be applied to?

johannes

^ permalink raw reply

* Re: [PATCH] ath9k : Fix ieee80211 work while going to suspend
From: Luis R. Rodriguez @ 2013-03-21 19:33 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: John W. Linville, Parag Warudkar, Jouni Malinen,
	Vasanthakumar Thiagarajan, linux-wireless, ath9k-devel, netdev,
	LKML, senthilb
In-Reply-To: <20130321114219.GB1459@redhat.com>

On Thu, Mar 21, 2013 at 12:42:20PM +0100, Stanislaw Gruszka wrote:
> On Mon, Mar 18, 2013 at 02:03:08PM -0700, Luis R. Rodriguez wrote:
> > > > --- a/drivers/net/wireless/ath/ath9k/link.c
> > > > +++ b/drivers/net/wireless/ath/ath9k/link.c
> > > > @@ -158,7 +158,8 @@ void ath_start_rx_poll(struct ath_softc *sc, u8 nbeacon)
> > > >  {
> > > >   if (!AR_SREV_9300(sc->sc_ah))
> > > >   return;
> > > > -
> > > > + if (sc->suspending)
> > > > + return;
> > 
> > Thanks for the patch! Please note the style issue here, you should
> > use a tab, but other than that lets review what happened.
> > 
> > > >   if (!test_bit(SC_OP_PRIM_STA_VIF, &sc->sc_flags))
> > > >   return;
> > 
> > Note that what this will do is call later mod_timer() for
> > rx_poll_timer, the right thing to do then, which would
> > be equivalent to your patch is to modify the ath_start_rx_poll()
> > to instead use the new API mod_timer_pending() added on v2.6.30
> > via commit 74019224. This would not re-arm the timer if it was
> > previously removed.
> [snip]
> > -	mod_timer(&sc->rx_poll_timer, jiffies + msecs_to_jiffies
> > -		  (nbeacon * sc->cur_beacon_conf.beacon_interval));
> > +	mod_timer_pending(&sc->rx_poll_timer, jiffies + msecs_to_jiffies
> > +			  (nbeacon * sc->cur_beacon_conf.beacon_interval));
> 
> But isn't this prevent to run timer in case it was not running, but
> we want to start it ?

No you're right, this would never have it run, sorry about that.
But lets look at this a little closer. The issue is at suspend
time, and the issue is ath9k is trying to schedule work while
going to suspend.

So when does this work get called?

Given the trace this is hit when the timer rx_poll_timer runs,
which in turn calls ath_rx_poll() to schedule hw_check_work work.
The rx_poll_timer however was originally only set at the end of
the routine that hw_check_work sets off but also at other entry
points (ath_start_rx_poll() callers). Once ath_start_rx_poll()
gets called though we can go on looping as follows:

work			timer			work
hw_check_work	-->	rx_poll_timer	-->	hw_check_work

At suspend time we do this though:

ath_cancel_work(sc);                                                    
del_timer_sync(&sc->rx_poll_timer);  

So perhaps what we need is:

diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 6e66f9c..42bb9ea 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -2151,9 +2152,9 @@ static int ath9k_suspend(struct ieee80211_hw *hw,
 
 	mutex_lock(&sc->mutex);
 
+	del_timer_sync(&sc->rx_poll_timer);
 	ath_cancel_work(sc);
 	ath_stop_ani(sc);
-	del_timer_sync(&sc->rx_poll_timer);
 
 	if (test_bit(SC_OP_INVALID, &sc->sc_flags)) {
 		ath_dbg(common, ANY, "Device not present\n");


The less changes needed to fix the issue for stable the easier
to backport.

Technically though then we'd need all of these changes as well then:

diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 6e66f9c..42bb9ea 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -752,8 +752,8 @@ static void ath9k_stop(struct ieee80211_hw *hw)
 
 	mutex_lock(&sc->mutex);
 
-	ath_cancel_work(sc);
 	del_timer_sync(&sc->rx_poll_timer);
+	ath_cancel_work(sc);
 
 	if (test_bit(SC_OP_INVALID, &sc->sc_flags)) {
 		ath_dbg(common, ANY, "Device not present\n");
@@ -1145,6 +1145,7 @@ static int ath9k_config(struct ieee80211_hw *hw, u32 changed)
 	if (changed & IEEE80211_CONF_CHANGE_IDLE) {
 		sc->ps_idle = !!(conf->flags & IEEE80211_CONF_IDLE);
 		if (sc->ps_idle) {
+			del_timer_sync(&sc->rx_poll_timer);
 			ath_cancel_work(sc);
 			ath9k_stop_btcoex(sc);
 		} else {
@@ -2151,9 +2152,9 @@ static int ath9k_suspend(struct ieee80211_hw *hw,
 
 	mutex_lock(&sc->mutex);
 
+	del_timer_sync(&sc->rx_poll_timer);
 	ath_cancel_work(sc);
 	ath_stop_ani(sc);
-	del_timer_sync(&sc->rx_poll_timer);
 
 	if (test_bit(SC_OP_INVALID, &sc->sc_flags)) {
 		ath_dbg(common, ANY, "Device not present\n");

But then we have the chicken and the egg problem, as the work item
could fire off the timer so it would seem to be good to prevent
adding new work when suspending.

> > Looking at this makes me think we should review all usage of
> > mod_timer all over our 802.11 drivers, and mac80211, cfg80211 as
> > well.
> 
> In mac80211 we use local->suspended and local->quiesce booleans to
> prevent reschedule of timers when going to suspend for example.
> Works use ifmgd->associted to prevent reschedule when we are
> disassociating.
> 
> I think on ath9k also some boolean variable should be used, not only
> for rx_poll_timer but also for other works i.e. tx_complete_work.
> Is possible to use SC_OP_INVALID flags, since mac80211 call ath9k_stop
> on suspend and ath9k_start on resume.

Indeed however ieee80211_queue_work() already does a suspend check for
us, it just complains as many drivers including mac80211 were setting
up work incorrectly. The warning was put in place to help us find the
issues. Using SC_OP_INVALID seems fair but we could also add a routine
ieee80211_queue_work_safe() that silently fails if we are quiescing or
suspended and not resuming but I can see that creating very sloppy
driver writing and everyone abusing it.

OK how about this for stable for now:

diff --git a/drivers/net/wireless/ath/ath9k/link.c b/drivers/net/wireless/ath/ath9k/link.c
index 39c84ec..7fdac6c 100644
--- a/drivers/net/wireless/ath/ath9k/link.c
+++ b/drivers/net/wireless/ath/ath9k/link.c
@@ -170,7 +170,8 @@ void ath_rx_poll(unsigned long data)
 {
 	struct ath_softc *sc = (struct ath_softc *)data;
 
-	ieee80211_queue_work(sc->hw, &sc->hw_check_work);
+	if (!test_bit(SC_OP_INVALID, &sc->sc_flags))
+		ieee80211_queue_work(sc->hw, &sc->hw_check_work);
 }
 
 /*

  Luis

^ permalink raw reply related

* [PATCH net-next] ptp: increase the maximum number of clocks
From: Jiri Benc @ 2013-03-21 19:24 UTC (permalink / raw)
  To: netdev; +Cc: Richard Cochran

As network adapters supporting PTP are becoming more common, machines with
many NICs suddenly have many PHCs, too. Although the PHCs are not used in
such cases, they produce error messages like this:

igb 0000:07:00.0: ptp_clock_register failed

Currently, the maximum number of devices accepted by ptp_clock_register
is 8 which is pretty low. We could silence the error messages but this would
hurt in case somebody wants to use one of the interfaces to actually run
PTP, as the /dev/ptp%d device for the desired interface may or may not be
available after each boot.

Let's increase the maximum to 128. This shouldn't be a problem, as for the
char devices, the whole major number is reserved anyway, and 128bit bitmap
isn't a big deal, either.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 drivers/ptp/ptp_clock.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 79f4bce..381d557 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -32,7 +32,7 @@
 #include "ptp_private.h"

 #define PTP_MAX_ALARMS 4
-#define PTP_MAX_CLOCKS 8
+#define PTP_MAX_CLOCKS 128
 #define PTP_PPS_DEFAULTS (PPS_CAPTUREASSERT | PPS_OFFSETASSERT)
 #define PTP_PPS_EVENT PPS_CAPTUREASSERT
 #define PTP_PPS_MODE (PTP_PPS_DEFAULTS | PPS_CANWAIT | PPS_TSFMT_TSPEC)
-- 
1.7.6.5

^ permalink raw reply related

* Re: [PATCH net-next 0/2] Get rid of attribute parsing in rtnetlink_rcv_msg()
From: David Miller @ 2013-03-21 19:19 UTC (permalink / raw)
  To: tgraf; +Cc: netdev
In-Reply-To: <cover.1363885020.git.tgraf@suug.ch>

From: Thomas Graf <tgraf@suug.ch>
Date: Thu, 21 Mar 2013 18:45:27 +0100

> We have been wanting to get rid of rta_buf for a while. It is error
> prone as it requires subsystems, especially modules, that register a
> rtnetlink mesage type to also modify rtnetlink.c and thus require
> the recompilation of the kernel. DECnet is the only user left so after
> converting it we can safely remove it
> 
> Thomas Graf (2):
>   decnet: Parse netlink attributes on our own
>   rtnetlink: Remove passing of attributes into rtnl_doit functions

Looks great.

I'll apply this once at least some basic sanity test has been done
on the decnet bits.

Thanks.

^ permalink raw reply

* Re: [PATCH 4/4] xen-netback: coalesce slots before copying
From: Wei Liu @ 2013-03-21 19:08 UTC (permalink / raw)
  To: James Harper
  Cc: Wei Liu, ian.campbell@citrix.com, konrad.wilk@oracle.com,
	netdev@vger.kernel.org, xen-devel@lists.xen.org,
	annie.li@oracle.com
In-Reply-To: <6035A0D088A63A46850C3988ED045A4B387EC8DC@BITCOM1.int.sbss.com.au>


[-- Attachment #1.1: Type: text/plain, Size: 336 bytes --]

On Mon, Mar 18, 2013 at 1:27 PM, James Harper <james.harper@bendigoit.com.au
> wrote:
>
>  >
>
> Actually it turns out GPLPV just stops counting at 20. If I keep counting
> I can sometimes see over 1000 buffers per GSO packet under Windows using
> "iperf -


Do you think it is necessary to increase MAX_SKB_SLOTS_DEFAULT to 21?


Wei.

[-- Attachment #1.2: Type: text/html, Size: 878 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply

* Re: [PATCH net-next] gro: relax ID check in inet_gro_receive()
From: David Miller @ 2013-03-21 18:56 UTC (permalink / raw)
  To: dkravkov; +Cc: eric.dumazet, netdev, dmitry, eilong, pshelar, hkchu, maze
In-Reply-To: <CAM8tLiMbSQZZW0NBPXGgdHRSJ8UqH+q=QaR+2KOqsf-gt6RHGg@mail.gmail.com>

From: Dmitry Kravkov <dkravkov@gmail.com>
Date: Thu, 21 Mar 2013 20:11:37 +0200

>> Well, I suspect this hardware is not capable of doing the proper ID
>> manipulation twice. (inner and outer header)
> 
> This is correct: ID only for one of the headers can be handled with
> current FW/HW, for other DF is set.

DF does not matter.

Regardless of DF, we must set the ID field correctly.

It is abundantly clear that the current GRE tunnel segmentation
is not generating packets according to our well documented
rules, in that we must be able to precisely create exactly
the original packet stream from the segmented frame.

Someone needs to send me patches to revert the bnx2x GRE segmentation
support, and any software implementation in our tree that has the same
bug.

If someone doesn't do it, I will revert all of this code myself.  You
simply will have to cope with not having this optimization until
your hardware can do it properly and according to our well established
rules for segmentation offloads.

Thanks.

^ permalink raw reply

* Re: [PATCH] net: reduce net_rx_action() latency to 2 HZ
From: Steven Rostedt @ 2013-03-21 18:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paul Gortmaker, David Miller, netdev, stable, Willy Tarreau,
	Tom Herbert
In-Reply-To: <1363887794.4431.29.camel@edumazet-glaptop>

On Thu, 2013-03-21 at 10:43 -0700, Eric Dumazet wrote:
> On Thu, 2013-03-21 at 13:25 -0400, Paul Gortmaker wrote:
> 
> > That is also reasonably portable back to 2.6.34.  And it is more
> > interesting too -- it will be interesting in a preempt_rt context
> > too, once RT moves ahead off the current 3.6 baseline, which still
> > has the old count-limit of 10 vs the new 2ms time limit.
> > 
> > RT (3.4 and 3.6 based) currently has this patch from Steven:
> > http://git.kernel.org/cgit/linux/kernel/git/paulg/3.6-rt-patches.git/tree/net-tx-action-avoid-livelock-on-rt.patch
> 
> Interesting, as Google has an internal patch removing this trylock() as
> well.
> 
> I think I should upstream it eventually ;)

Yes please :-)

-- Steve


> 
> commit 2f0a3f573b531dc57c268fd809dc65169edae369
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Thu Dec 13 09:18:01 2012 -0800
> 
>     net-dev_xmit_hold_queues: fix a busy loop in net_tx_action
>     
>     Under load, net_tx_action() fails to acquire qdisc lock
>     and reschedules qdisc in a never ending loop.
>     
>     The spin_trylock() has almost no chance to complete because
>     of ticket spinlock and xmit_hold_queue holding the lock for long
>     period of times.
>     
> 

^ permalink raw reply

* Re: [PATCH 4/4] xen-netback: coalesce slots before copying
From: Wei Liu @ 2013-03-21 18:37 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, netdev@vger.kernel.org, xen-devel@lists.xen.org,
	konrad.wilk@oracle.com, annie.li@oracle.com
In-Reply-To: <1363608462.30193.34.camel@zakaz.uk.xensource.com>

On Mon, 2013-03-18 at 12:07 +0000, Ian Campbell wrote:

> >  	/* Skip first skb fragment if it is on same page as header fragment. */
> >  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
> >  
> > -	for (i = start; i < shinfo->nr_frags; i++, txp++) {
> > -		struct page *page;
> > -		pending_ring_idx_t index;
> > +	/* Coalesce tx requests, at this point the packet passed in
> > +	 * should be <= 64K. Any packets larger than 64K has been
> > +	 * dropped / caused fatal error early on.
> 
> Whereabouts is this? Since the size field is u16 how do we even detect
> this case. Since (at least prior to your other fix in this series) it
> would have overflowed when the guest constructed the request.
> 

This is done in netbk_count_requests(). I will fix the comment here.

> 
> > @@ -1025,6 +1108,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
> >  	struct gnttab_copy *gop = *gopp;
> >  	u16 pending_idx = *((u16 *)skb->data);
> >  	struct skb_shared_info *shinfo = skb_shinfo(skb);
> > +	struct pending_tx_info *tx_info;
> >  	int nr_frags = shinfo->nr_frags;
> >  	int i, err, start;
> >  
> > @@ -1037,12 +1121,17 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
> >  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
> >  
> >  	for (i = start; i < nr_frags; i++) {
> > -		int j, newerr;
> > +		int j, newerr = 0, n;
> >  
> >  		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
> > +		tx_info = &netbk->pending_tx_info[pending_idx];
> >  
> >  		/* Check error status: if okay then remember grant handle. */
> > -		newerr = (++gop)->status;
> > +		for (n = 0; n < tx_info->nr_tx_req; n++) {
> struct pending_tx_info is used in some arrays which can have a fair few
> elements so if there are ways to reduce the size that is worth
> considering I think.
> 
> So rather than storing both nr_tx_req and start_idx can we just store
> start_idx and loop while start_idx != 0 (where the first one has
> start_idx == zero)?
> 
> This might fall out more naturally if you were to instead store next_idx
> in each pending tx with a suitable terminator at the end? Or could be
> last_idx if it is convenient to count that way round, you don't need to
> respond in-order.
> 

Done shrinking this structure.


Wei.

^ permalink raw reply

* Re: [PATCH net-next] net: fix psock_fanout on sparc64
From: David Miller @ 2013-03-21 18:31 UTC (permalink / raw)
  To: willemb; +Cc: netdev
In-Reply-To: <1363889403-7712-1-git-send-email-willemb@google.com>

From: Willem de Bruijn <willemb@google.com>
Date: Thu, 21 Mar 2013 14:10:03 -0400

> The packetsocket fanout test uses a packet ring. Use TPACKET_V2
> instead of TPACKET_V1 to work around a known 32/64 bit issue in
> the older ring that manifests on sparc64.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Applied, thanks.

^ permalink raw reply

* [PATCH net-next] net: fix psock_fanout on sparc64
From: Willem de Bruijn @ 2013-03-21 18:10 UTC (permalink / raw)
  To: davem, netdev; +Cc: Willem de Bruijn
In-Reply-To: <CA+FuTSewLdQZbWPLrdbtG2xtaWSdrPDO3yz6boXqgStYhh3fcg@mail.gmail.com>

The packetsocket fanout test uses a packet ring. Use TPACKET_V2
instead of TPACKET_V1 to work around a known 32/64 bit issue in
the older ring that manifests on sparc64.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 tools/testing/selftests/net/psock_fanout.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c
index 226e5e3..59bd636 100644
--- a/tools/testing/selftests/net/psock_fanout.c
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -182,7 +182,13 @@ static char *sock_fanout_open_ring(int fd)
 		.tp_frame_nr   = RING_NUM_FRAMES,
 	};
 	char *ring;
+	int val = TPACKET_V2;
 
+	if (setsockopt(fd, SOL_PACKET, PACKET_VERSION, (void *) &val,
+		       sizeof(val))) {
+		perror("packetsock ring setsockopt version");
+		exit(1);
+	}
 	if (setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req,
 		       sizeof(req))) {
 		perror("packetsock ring setsockopt");
@@ -201,7 +207,7 @@ static char *sock_fanout_open_ring(int fd)
 
 static int sock_fanout_read_ring(int fd, void *ring)
 {
-	struct tpacket_hdr *header = ring;
+	struct tpacket2_hdr *header = ring;
 	int count = 0;
 
 	while (header->tp_status & TP_STATUS_USER && count < RING_NUM_FRAMES) {
-- 
1.8.1.3

^ permalink raw reply related

* Re: FW: [PATCH ethtool] ethtool: Add string to display support for KR PHY
From: Ben Hutchings @ 2013-03-21 18:15 UTC (permalink / raw)
  To: Vasundhara, Volam; +Cc: netdev@vger.kernel.org
In-Reply-To: <7C0C8AF2F9AEF149AE37E5CA3E2101D3216EDBB7@CMEXMB2.ad.emulex.com>

On Wed, 2013-03-13 at 09:29 +0000, Vasundhara, Volam wrote:
> Please apply.

Sorry for the delay; I've applied this now.

Ben.

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Vasundhara, Volam
> Sent: Tuesday, February 12, 2013 3:16 PM
> To: bhutchings@solarflare.com
> Cc: netdev@vger.kernel.org; Vasundhara, Volam
> Subject: [PATCH ethtool] ethtool: Add string to display support for KR PHY
> 
> 
> Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
> ---
>  ethtool.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/ethtool.c b/ethtool.c
> index 55bc082..205f574 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -512,6 +512,7 @@ dump_link_caps(const char *prefix, const char *an_prefix, u32 mask,
>  		{ 0, ADVERTISED_2500baseX_Full,     "2500baseX/Full" },
>  		{ 0, ADVERTISED_10000baseT_Full,    "10000baseT/Full" },
>  		{ 0, ADVERTISED_10000baseKX4_Full,  "10000baseKX4/Full" },
> +		{ 0, ADVERTISED_10000baseKR_Full,   "10000baseKR/Full" },
>  		{ 0, ADVERTISED_20000baseMLD2_Full, "20000baseMLD2/Full" },
>  		{ 0, ADVERTISED_40000baseKR4_Full,  "40000baseKR4/Full" },
>  		{ 0, ADVERTISED_40000baseCR4_Full,  "40000baseCR4/Full" },
> --
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next] gro: relax ID check in inet_gro_receive()
From: Dmitry Kravkov @ 2013-03-21 18:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev@vger.kernel.org, Dmitry Kravkov,
	Eilon Greenstein, pshelar, hkchu, maze
In-Reply-To: <1363882091.4431.20.camel@edumazet-glaptop>

Resending for netdev in plantext.


On Thu, Mar 21, 2013 at 6:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> On Thu, 2013-03-21 at 11:46 -0400, David Miller wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 20 Mar 2013 21:52:33 -0700
> >
> > > GRE TSO support doesn't increment the ID in the inner IP header.
> >
> > Is this a fundamental limitation of doing TSO over GRO or
> > were the Broadcom folks just being lazy with their firmware
> > implementation?
> >
>
> Well, I suspect this hardware is not capable of doing the proper ID
> manipulation twice. (inner and outer header)

This is correct: ID only for one of the headers can be handled with
current FW/HW, for other DF is set.

> Still TSO support permits a single GRE flow going from 3Gbps to 9Gbps on
> our hosts. So even if the inner IP id is 'broken', we are going to use
> TSO.
>
> Note we are limited by the receiver, as the receiver has to perform the
> tcp checksum in software (bnx2x doesnt support CHECKSUM_COMPLETE yet)
>
> Hopefully next firmware or NIC will do the right thing.
>
> > I really don't want to apply this patch, because ipv4 frames
> > even with DF set should have an incrementing ID field, in
> > order to accomodate various header compression schemes.
> >
> > We go out of our way to do this for normal unencapsulated TCP stream
> > packets, rather than set the ID field to zero (which we did for some
> > time until the compression issue was pointed out to us).
>
> I understand your concern, but this check in GRO brings nothing at all.
>
> Once we receive frames with 'bad IPv4 ID', should we accept them or drop
> them ?
>
> TCP stack doesn't care at receive (obviously as this ID is not a concern
> for the transport layer), so GRO should do the same, as GRO is a best
> effort to reduce cpu load.
>
> I fully understand the 'tos' check because of proper ECN support, but
> the ttl check or id check are totally useless and time consuming.
>
> GRO aggregation should roughly work the same than TCP coalescing, and we
> don't care of IP ID or ttl in TCP stack.
>

^ permalink raw reply

* Re: [PATCH net-next 1/2] decnet: Parse netlink attributes on our own
From: Steven Whitehouse @ 2013-03-21 18:04 UTC (permalink / raw)
  To: Thomas Graf; +Cc: davem, netdev, linux-decnet-user
In-Reply-To: <5b888618a6aebfebf496c91482794a606b3bb094.1363885020.git.tgraf@suug.ch>

Hi,

On Thu, 2013-03-21 at 18:45 +0100, Thomas Graf wrote:
> decnet is the only subsystem left that is relying on the global
> netlink attribute buffer rta_buf. It's horrible design and we
> want to get rid of it.
> 
> This converts all of decnet to do implicit attribute parsing. It
> also gets rid of the error prone struct dn_kern_rta.
> 
> Yes, the fib_magic() stuff is not pretty.
> 
> It's compiled tested but I need someone with appropriate hardware
> to test the patch since I don't have access to it.
> 

You shouldn't need any special hardware to test this. A copy of iproute2
should be enough as you should be able to use that to create an
interface or two and a route between them, etc. Although DECnet routing
works in a different way to ip routing, the Linux implementation tries
to stick fairly closely to the ip way of doing things whenever it can in
order to share infrastructure. Now that ip has diverged a fair bit over
time that isn't quite as true as it was, but there shouldn't be anything
too surprising in there.

If you want to actually pass traffic, then you'll have to set the MAC
address of your ethernet card and use the tools that Chrissie wrote to
do that.

I took a quick look at the patches and I can't spot anything obviously
wrong, but then it is a long time since I last seriously looked at it,
so I could easily have missed something,

Steve.


> Cc: linux-decnet-user@lists.sourceforge.net
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
>  include/net/dn_fib.h  |  28 ++-----
>  net/decnet/dn_fib.c   | 211 +++++++++++++++++++++++++++-----------------------
>  net/decnet/dn_route.c |  27 ++++---
>  net/decnet/dn_table.c |  42 +++++-----
>  4 files changed, 160 insertions(+), 148 deletions(-)
> 
> diff --git a/include/net/dn_fib.h b/include/net/dn_fib.h
> index 1ee9d4b..74004af 100644
> --- a/include/net/dn_fib.h
> +++ b/include/net/dn_fib.h
> @@ -1,24 +1,9 @@
>  #ifndef _NET_DN_FIB_H
>  #define _NET_DN_FIB_H
>  
> -/* WARNING: The ordering of these elements must match ordering
> - *          of RTA_* rtnetlink attribute numbers.
> - */
> -struct dn_kern_rta {
> -        void            *rta_dst;
> -        void            *rta_src;
> -        int             *rta_iif;
> -        int             *rta_oif;
> -        void            *rta_gw;
> -        u32             *rta_priority;
> -        void            *rta_prefsrc;
> -        struct rtattr   *rta_mx;
> -        struct rtattr   *rta_mp;
> -        unsigned char   *rta_protoinfo;
> -        u32             *rta_flow;
> -        struct rta_cacheinfo *rta_ci;
> -	struct rta_session *rta_sess;
> -};
> +#include <linux/netlink.h>
> +
> +extern const struct nla_policy rtm_dn_policy[];
>  
>  struct dn_fib_res {
>  	struct fib_rule *r;
> @@ -93,10 +78,10 @@ struct dn_fib_table {
>  	u32 n;
>  
>  	int (*insert)(struct dn_fib_table *t, struct rtmsg *r, 
> -			struct dn_kern_rta *rta, struct nlmsghdr *n, 
> +			struct nlattr *attrs[], struct nlmsghdr *n,
>  			struct netlink_skb_parms *req);
>  	int (*delete)(struct dn_fib_table *t, struct rtmsg *r,
> -			struct dn_kern_rta *rta, struct nlmsghdr *n,
> +			struct nlattr *attrs[], struct nlmsghdr *n,
>  			struct netlink_skb_parms *req);
>  	int (*lookup)(struct dn_fib_table *t, const struct flowidn *fld,
>  			struct dn_fib_res *res);
> @@ -116,13 +101,12 @@ extern void dn_fib_cleanup(void);
>  extern int dn_fib_ioctl(struct socket *sock, unsigned int cmd, 
>  			unsigned long arg);
>  extern struct dn_fib_info *dn_fib_create_info(const struct rtmsg *r, 
> -				struct dn_kern_rta *rta, 
> +				struct nlattr *attrs[],
>  				const struct nlmsghdr *nlh, int *errp);
>  extern int dn_fib_semantic_match(int type, struct dn_fib_info *fi, 
>  			const struct flowidn *fld,
>  			struct dn_fib_res *res);
>  extern void dn_fib_release_info(struct dn_fib_info *fi);
> -extern __le16 dn_fib_get_attr16(struct rtattr *attr, int attrlen, int type);
>  extern void dn_fib_flush(void);
>  extern void dn_fib_select_multipath(const struct flowidn *fld,
>  					struct dn_fib_res *res);
> diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
> index e36614e..42a8048 100644
> --- a/net/decnet/dn_fib.c
> +++ b/net/decnet/dn_fib.c
> @@ -145,22 +145,10 @@ static inline struct dn_fib_info *dn_fib_find_info(const struct dn_fib_info *nfi
>  	return NULL;
>  }
>  
> -__le16 dn_fib_get_attr16(struct rtattr *attr, int attrlen, int type)
> +static int dn_fib_count_nhs(const struct nlattr *attr)
>  {
> -	while(RTA_OK(attr,attrlen)) {
> -		if (attr->rta_type == type)
> -			return *(__le16*)RTA_DATA(attr);
> -		attr = RTA_NEXT(attr, attrlen);
> -	}
> -
> -	return 0;
> -}
> -
> -static int dn_fib_count_nhs(struct rtattr *rta)
> -{
> -	int nhs = 0;
> -	struct rtnexthop *nhp = RTA_DATA(rta);
> -	int nhlen = RTA_PAYLOAD(rta);
> +	struct rtnexthop *nhp = nla_data(attr);
> +	int nhs = 0, nhlen = nla_len(attr);
>  
>  	while(nhlen >= (int)sizeof(struct rtnexthop)) {
>  		if ((nhlen -= nhp->rtnh_len) < 0)
> @@ -172,10 +160,11 @@ static int dn_fib_count_nhs(struct rtattr *rta)
>  	return nhs;
>  }
>  
> -static int dn_fib_get_nhs(struct dn_fib_info *fi, const struct rtattr *rta, const struct rtmsg *r)
> +static int dn_fib_get_nhs(struct dn_fib_info *fi, const struct nlattr *attr,
> +			  const struct rtmsg *r)
>  {
> -	struct rtnexthop *nhp = RTA_DATA(rta);
> -	int nhlen = RTA_PAYLOAD(rta);
> +	struct rtnexthop *nhp = nla_data(attr);
> +	int nhlen = nla_len(attr);
>  
>  	change_nexthops(fi) {
>  		int attrlen = nhlen - sizeof(struct rtnexthop);
> @@ -187,7 +176,10 @@ static int dn_fib_get_nhs(struct dn_fib_info *fi, const struct rtattr *rta, cons
>  		nh->nh_weight = nhp->rtnh_hops + 1;
>  
>  		if (attrlen) {
> -			nh->nh_gw = dn_fib_get_attr16(RTNH_DATA(nhp), attrlen, RTA_GATEWAY);
> +			struct nlattr *gw_attr;
> +
> +			gw_attr = nla_find((struct nlattr *) (nhp + 1), attrlen, RTA_GATEWAY);
> +			nh->nh_gw = gw_attr ? nla_get_le16(gw_attr) : 0;
>  		}
>  		nhp = RTNH_NEXT(nhp);
>  	} endfor_nexthops(fi);
> @@ -268,7 +260,8 @@ out:
>  }
>  
> 
> -struct dn_fib_info *dn_fib_create_info(const struct rtmsg *r, struct dn_kern_rta *rta, const struct nlmsghdr *nlh, int *errp)
> +struct dn_fib_info *dn_fib_create_info(const struct rtmsg *r, struct nlattr *attrs[],
> +				       const struct nlmsghdr *nlh, int *errp)
>  {
>  	int err;
>  	struct dn_fib_info *fi = NULL;
> @@ -281,11 +274,9 @@ struct dn_fib_info *dn_fib_create_info(const struct rtmsg *r, struct dn_kern_rta
>  	if (dn_fib_props[r->rtm_type].scope > r->rtm_scope)
>  		goto err_inval;
>  
> -	if (rta->rta_mp) {
> -		nhs = dn_fib_count_nhs(rta->rta_mp);
> -		if (nhs == 0)
> -			goto err_inval;
> -	}
> +	if (attrs[RTA_MULTIPATH] &&
> +	    (nhs = dn_fib_count_nhs(attrs[RTA_MULTIPATH])) == 0)
> +		goto err_inval;
>  
>  	fi = kzalloc(sizeof(*fi)+nhs*sizeof(struct dn_fib_nh), GFP_KERNEL);
>  	err = -ENOBUFS;
> @@ -295,53 +286,65 @@ struct dn_fib_info *dn_fib_create_info(const struct rtmsg *r, struct dn_kern_rta
>  	fi->fib_protocol = r->rtm_protocol;
>  	fi->fib_nhs = nhs;
>  	fi->fib_flags = r->rtm_flags;
> -	if (rta->rta_priority)
> -		fi->fib_priority = *rta->rta_priority;
> -	if (rta->rta_mx) {
> -		int attrlen = RTA_PAYLOAD(rta->rta_mx);
> -		struct rtattr *attr = RTA_DATA(rta->rta_mx);
>  
> -		while(RTA_OK(attr, attrlen)) {
> -			unsigned int flavour = attr->rta_type;
> +	if (attrs[RTA_PRIORITY])
> +		fi->fib_priority = nla_get_u32(attrs[RTA_PRIORITY]);
> +
> +	if (attrs[RTA_METRICS]) {
> +		struct nlattr *attr;
> +		int rem;
>  
> -			if (flavour) {
> -				if (flavour > RTAX_MAX)
> +		nla_for_each_nested(attr, attrs[RTA_METRICS], rem) {
> +			int type = nla_type(attr);
> +
> +			if (type) {
> +				if (type > RTAX_MAX || nla_len(attr) < 4)
>  					goto err_inval;
> -				fi->fib_metrics[flavour-1] = *(unsigned int *)RTA_DATA(attr);
> +
> +				fi->fib_metrics[type-1] = nla_get_u32(attr);
>  			}
> -			attr = RTA_NEXT(attr, attrlen);
>  		}
>  	}
> -	if (rta->rta_prefsrc)
> -		memcpy(&fi->fib_prefsrc, rta->rta_prefsrc, 2);
>  
> -	if (rta->rta_mp) {
> -		if ((err = dn_fib_get_nhs(fi, rta->rta_mp, r)) != 0)
> +	if (attrs[RTA_PREFSRC])
> +		fi->fib_prefsrc = nla_get_le16(attrs[RTA_PREFSRC]);
> +
> +	if (attrs[RTA_MULTIPATH]) {
> +		if ((err = dn_fib_get_nhs(fi, attrs[RTA_MULTIPATH], r)) != 0)
>  			goto failure;
> -		if (rta->rta_oif && fi->fib_nh->nh_oif != *rta->rta_oif)
> +
> +		if (attrs[RTA_OIF] &&
> +		    fi->fib_nh->nh_oif != nla_get_u32(attrs[RTA_OIF]))
>  			goto err_inval;
> -		if (rta->rta_gw && memcmp(&fi->fib_nh->nh_gw, rta->rta_gw, 2))
> +
> +		if (attrs[RTA_GATEWAY] &&
> +		    fi->fib_nh->nh_gw != nla_get_le16(attrs[RTA_GATEWAY]))
>  			goto err_inval;
>  	} else {
>  		struct dn_fib_nh *nh = fi->fib_nh;
> -		if (rta->rta_oif)
> -			nh->nh_oif = *rta->rta_oif;
> -		if (rta->rta_gw)
> -			memcpy(&nh->nh_gw, rta->rta_gw, 2);
> +
> +		if (attrs[RTA_OIF])
> +			nh->nh_oif = nla_get_u32(attrs[RTA_OIF]);
> +
> +		if (attrs[RTA_GATEWAY])
> +			nh->nh_gw = nla_get_le16(attrs[RTA_GATEWAY]);
> +
>  		nh->nh_flags = r->rtm_flags;
>  		nh->nh_weight = 1;
>  	}
>  
>  	if (r->rtm_type == RTN_NAT) {
> -		if (rta->rta_gw == NULL || nhs != 1 || rta->rta_oif)
> +		if (!attrs[RTA_GATEWAY] || nhs != 1 || attrs[RTA_OIF])
>  			goto err_inval;
> -		memcpy(&fi->fib_nh->nh_gw, rta->rta_gw, 2);
> +
> +		fi->fib_nh->nh_gw = nla_get_le16(attrs[RTA_GATEWAY]);
>  		goto link_it;
>  	}
>  
>  	if (dn_fib_props[r->rtm_type].error) {
> -		if (rta->rta_gw || rta->rta_oif || rta->rta_mp)
> +		if (attrs[RTA_GATEWAY] || attrs[RTA_OIF] || attrs[RTA_MULTIPATH])
>  			goto err_inval;
> +
>  		goto link_it;
>  	}
>  
> @@ -367,8 +370,8 @@ struct dn_fib_info *dn_fib_create_info(const struct rtmsg *r, struct dn_kern_rta
>  	}
>  
>  	if (fi->fib_prefsrc) {
> -		if (r->rtm_type != RTN_LOCAL || rta->rta_dst == NULL ||
> -		    memcmp(&fi->fib_prefsrc, rta->rta_dst, 2))
> +		if (r->rtm_type != RTN_LOCAL || !attrs[RTA_DST] ||
> +		    fi->fib_prefsrc != nla_get_le16(attrs[RTA_DST]))
>  			if (dnet_addr_type(fi->fib_prefsrc) != RTN_LOCAL)
>  				goto err_inval;
>  	}
> @@ -486,29 +489,24 @@ void dn_fib_select_multipath(const struct flowidn *fld, struct dn_fib_res *res)
>  	spin_unlock_bh(&dn_fib_multipath_lock);
>  }
>  
> +const struct nla_policy rtm_dn_policy[RTA_MAX + 1] = {
> +	[RTA_DST]		= { .type = NLA_U16 },
> +	[RTA_SRC]		= { .type = NLA_U16 },
> +	[RTA_IIF]		= { .type = NLA_U32 },
> +	[RTA_OIF]		= { .type = NLA_U32 },
> +	[RTA_GATEWAY]		= { .type = NLA_U16 },
> +	[RTA_PRIORITY]		= { .type = NLA_U32 },
> +	[RTA_PREFSRC]		= { .type = NLA_U16 },
> +	[RTA_METRICS]		= { .type = NLA_NESTED },
> +	[RTA_MULTIPATH]		= { .type = NLA_NESTED },
> +	[RTA_TABLE]		= { .type = NLA_U32 },
> +	[RTA_MARK]		= { .type = NLA_U32 },
> +};
>  
> -static int dn_fib_check_attr(struct rtmsg *r, struct rtattr **rta)
> -{
> -	int i;
> -
> -	for(i = 1; i <= RTA_MAX; i++) {
> -		struct rtattr *attr = rta[i-1];
> -		if (attr) {
> -			if (RTA_PAYLOAD(attr) < 4 && RTA_PAYLOAD(attr) != 2)
> -				return -EINVAL;
> -			if (i != RTA_MULTIPATH && i != RTA_METRICS &&
> -			    i != RTA_TABLE)
> -				rta[i-1] = (struct rtattr *)RTA_DATA(attr);
> -		}
> -	}
> -
> -	return 0;
> -}
> -
> -static inline u32 rtm_get_table(struct rtattr **rta, u8 table)
> +static inline u32 rtm_get_table(struct nlattr *attrs[], u8 table)
>  {
> -	if (rta[RTA_TABLE - 1])
> -		table = nla_get_u32((struct nlattr *) rta[RTA_TABLE - 1]);
> +	if (attrs[RTA_TABLE])
> +		table = nla_get_u32(attrs[RTA_TABLE]);
>  
>  	return table;
>  }
> @@ -517,8 +515,9 @@ static int dn_fib_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *
>  {
>  	struct net *net = sock_net(skb->sk);
>  	struct dn_fib_table *tb;
> -	struct rtattr **rta = arg;
> -	struct rtmsg *r = NLMSG_DATA(nlh);
> +	struct rtmsg *r = nlmsg_data(nlh);
> +	struct nlattr *attrs[RTA_MAX+1];
> +	int err;
>  
>  	if (!capable(CAP_NET_ADMIN))
>  		return -EPERM;
> @@ -526,22 +525,24 @@ static int dn_fib_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *
>  	if (!net_eq(net, &init_net))
>  		return -EINVAL;
>  
> -	if (dn_fib_check_attr(r, rta))
> -		return -EINVAL;
> +	err = nlmsg_parse(nlh, sizeof(*r), attrs, RTA_MAX, rtm_dn_policy);
> +	if (err < 0)
> +		return err;
>  
> -	tb = dn_fib_get_table(rtm_get_table(rta, r->rtm_table), 0);
> -	if (tb)
> -		return tb->delete(tb, r, (struct dn_kern_rta *)rta, nlh, &NETLINK_CB(skb));
> +	tb = dn_fib_get_table(rtm_get_table(attrs, r->rtm_table), 0);
> +	if (!tb)
> +		return -ESRCH;
>  
> -	return -ESRCH;
> +	return tb->delete(tb, r, attrs, nlh, &NETLINK_CB(skb));
>  }
>  
>  static int dn_fib_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
>  {
>  	struct net *net = sock_net(skb->sk);
>  	struct dn_fib_table *tb;
> -	struct rtattr **rta = arg;
> -	struct rtmsg *r = NLMSG_DATA(nlh);
> +	struct rtmsg *r = nlmsg_data(nlh);
> +	struct nlattr *attrs[RTA_MAX+1];
> +	int err;
>  
>  	if (!capable(CAP_NET_ADMIN))
>  		return -EPERM;
> @@ -549,14 +550,15 @@ static int dn_fib_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *
>  	if (!net_eq(net, &init_net))
>  		return -EINVAL;
>  
> -	if (dn_fib_check_attr(r, rta))
> -		return -EINVAL;
> +	err = nlmsg_parse(nlh, sizeof(*r), attrs, RTA_MAX, rtm_dn_policy);
> +	if (err < 0)
> +		return err;
>  
> -	tb = dn_fib_get_table(rtm_get_table(rta, r->rtm_table), 1);
> -	if (tb)
> -		return tb->insert(tb, r, (struct dn_kern_rta *)rta, nlh, &NETLINK_CB(skb));
> +	tb = dn_fib_get_table(rtm_get_table(attrs, r->rtm_table), 1);
> +	if (!tb)
> +		return -ENOBUFS;
>  
> -	return -ENOBUFS;
> +	return tb->insert(tb, r, attrs, nlh, &NETLINK_CB(skb));
>  }
>  
>  static void fib_magic(int cmd, int type, __le16 dst, int dst_len, struct dn_ifaddr *ifa)
> @@ -566,10 +568,31 @@ static void fib_magic(int cmd, int type, __le16 dst, int dst_len, struct dn_ifad
>  		struct nlmsghdr nlh;
>  		struct rtmsg rtm;
>  	} req;
> -	struct dn_kern_rta rta;
> +	struct {
> +		struct nlattr hdr;
> +		__le16 dst;
> +	} dst_attr = {
> +		.dst = dst,
> +	};
> +	struct {
> +		struct nlattr hdr;
> +		__le16 prefsrc;
> +	} prefsrc_attr = {
> +		.prefsrc = ifa->ifa_local,
> +	};
> +	struct {
> +		struct nlattr hdr;
> +		u32 oif;
> +	} oif_attr = {
> +		.oif = ifa->ifa_dev->dev->ifindex,
> +	};
> +	struct nlattr *attrs[RTA_MAX+1] = {
> +		[RTA_DST] = (struct nlattr *) &dst_attr,
> +		[RTA_PREFSRC] = (struct nlattr * ) &prefsrc_attr,
> +		[RTA_OIF] = (struct nlattr *) &oif_attr,
> +	};
>  
>  	memset(&req.rtm, 0, sizeof(req.rtm));
> -	memset(&rta, 0, sizeof(rta));
>  
>  	if (type == RTN_UNICAST)
>  		tb = dn_fib_get_table(RT_MIN_TABLE, 1);
> @@ -591,14 +614,10 @@ static void fib_magic(int cmd, int type, __le16 dst, int dst_len, struct dn_ifad
>  	req.rtm.rtm_scope = (type != RTN_LOCAL ? RT_SCOPE_LINK : RT_SCOPE_HOST);
>  	req.rtm.rtm_type = type;
>  
> -	rta.rta_dst = &dst;
> -	rta.rta_prefsrc = &ifa->ifa_local;
> -	rta.rta_oif = &ifa->ifa_dev->dev->ifindex;
> -
>  	if (cmd == RTM_NEWROUTE)
> -		tb->insert(tb, &req.rtm, &rta, &req.nlh, NULL);
> +		tb->insert(tb, &req.rtm, attrs, &req.nlh, NULL);
>  	else
> -		tb->delete(tb, &req.rtm, &rta, &req.nlh, NULL);
> +		tb->delete(tb, &req.rtm, attrs, &req.nlh, NULL);
>  }
>  
>  static void dn_fib_add_ifaddr(struct dn_ifaddr *ifa)
> diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
> index 5ac0e15..b4b3508 100644
> --- a/net/decnet/dn_route.c
> +++ b/net/decnet/dn_route.c
> @@ -1619,17 +1619,21 @@ errout:
>  static int dn_cache_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void *arg)
>  {
>  	struct net *net = sock_net(in_skb->sk);
> -	struct rtattr **rta = arg;
>  	struct rtmsg *rtm = nlmsg_data(nlh);
>  	struct dn_route *rt = NULL;
>  	struct dn_skb_cb *cb;
>  	int err;
>  	struct sk_buff *skb;
>  	struct flowidn fld;
> +	struct nlattr *tb[RTA_MAX+1];
>  
>  	if (!net_eq(net, &init_net))
>  		return -EINVAL;
>  
> +	err = nlmsg_parse(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_dn_policy);
> +	if (err < 0)
> +		return err;
> +
>  	memset(&fld, 0, sizeof(fld));
>  	fld.flowidn_proto = DNPROTO_NSP;
>  
> @@ -1639,12 +1643,14 @@ static int dn_cache_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void
>  	skb_reset_mac_header(skb);
>  	cb = DN_SKB_CB(skb);
>  
> -	if (rta[RTA_SRC-1])
> -		memcpy(&fld.saddr, RTA_DATA(rta[RTA_SRC-1]), 2);
> -	if (rta[RTA_DST-1])
> -		memcpy(&fld.daddr, RTA_DATA(rta[RTA_DST-1]), 2);
> -	if (rta[RTA_IIF-1])
> -		memcpy(&fld.flowidn_iif, RTA_DATA(rta[RTA_IIF-1]), sizeof(int));
> +	if (tb[RTA_SRC])
> +		fld.saddr = nla_get_le16(tb[RTA_SRC]);
> +
> +	if (tb[RTA_DST])
> +		fld.daddr = nla_get_le16(tb[RTA_DST]);
> +
> +	if (tb[RTA_IIF])
> +		fld.flowidn_iif = nla_get_u32(tb[RTA_IIF]);
>  
>  	if (fld.flowidn_iif) {
>  		struct net_device *dev;
> @@ -1669,10 +1675,9 @@ static int dn_cache_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void
>  		if (!err && -rt->dst.error)
>  			err = rt->dst.error;
>  	} else {
> -		int oif = 0;
> -		if (rta[RTA_OIF - 1])
> -			memcpy(&oif, RTA_DATA(rta[RTA_OIF - 1]), sizeof(int));
> -		fld.flowidn_oif = oif;
> +		if (tb[RTA_OIF])
> +			fld.flowidn_oif = nla_get_u32(tb[RTA_OIF]);
> +
>  		err = dn_route_output_key((struct dst_entry **)&rt, &fld, 0);
>  	}
>  
> diff --git a/net/decnet/dn_table.c b/net/decnet/dn_table.c
> index 6c2445b..fc42a0a 100644
> --- a/net/decnet/dn_table.c
> +++ b/net/decnet/dn_table.c
> @@ -224,26 +224,27 @@ static struct dn_zone *dn_new_zone(struct dn_hash *table, int z)
>  }
>  
> 
> -static int dn_fib_nh_match(struct rtmsg *r, struct nlmsghdr *nlh, struct dn_kern_rta *rta, struct dn_fib_info *fi)
> +static int dn_fib_nh_match(struct rtmsg *r, struct nlmsghdr *nlh, struct nlattr *attrs[], struct dn_fib_info *fi)
>  {
>  	struct rtnexthop *nhp;
>  	int nhlen;
>  
> -	if (rta->rta_priority && *rta->rta_priority != fi->fib_priority)
> +	if (attrs[RTA_PRIORITY] &&
> +	    nla_get_u32(attrs[RTA_PRIORITY]) != fi->fib_priority)
>  		return 1;
>  
> -	if (rta->rta_oif || rta->rta_gw) {
> -		if ((!rta->rta_oif || *rta->rta_oif == fi->fib_nh->nh_oif) &&
> -		    (!rta->rta_gw  || memcmp(rta->rta_gw, &fi->fib_nh->nh_gw, 2) == 0))
> +	if (attrs[RTA_OIF] || attrs[RTA_GATEWAY]) {
> +		if ((!attrs[RTA_OIF] || nla_get_u32(attrs[RTA_OIF]) == fi->fib_nh->nh_oif) &&
> +		    (!attrs[RTA_GATEWAY]  || nla_get_le16(attrs[RTA_GATEWAY]) != fi->fib_nh->nh_gw))
>  			return 0;
>  		return 1;
>  	}
>  
> -	if (rta->rta_mp == NULL)
> +	if (!attrs[RTA_MULTIPATH])
>  		return 0;
>  
> -	nhp = RTA_DATA(rta->rta_mp);
> -	nhlen = RTA_PAYLOAD(rta->rta_mp);
> +	nhp = nla_data(attrs[RTA_MULTIPATH]);
> +	nhlen = nla_len(attrs[RTA_MULTIPATH]);
>  
>  	for_nexthops(fi) {
>  		int attrlen = nhlen - sizeof(struct rtnexthop);
> @@ -254,7 +255,10 @@ static int dn_fib_nh_match(struct rtmsg *r, struct nlmsghdr *nlh, struct dn_kern
>  		if (nhp->rtnh_ifindex && nhp->rtnh_ifindex != nh->nh_oif)
>  			return 1;
>  		if (attrlen) {
> -			gw = dn_fib_get_attr16(RTNH_DATA(nhp), attrlen, RTA_GATEWAY);
> +			struct nlattr *gw_attr;
> +
> +			gw_attr = nla_find((struct nlattr *) (nhp + 1), attrlen, RTA_GATEWAY);
> +			gw = gw_attr ? nla_get_le16(gw_attr) : 0;
>  
>  			if (gw && gw != nh->nh_gw)
>  				return 1;
> @@ -517,7 +521,8 @@ out:
>  	return skb->len;
>  }
>  
> -static int dn_fib_table_insert(struct dn_fib_table *tb, struct rtmsg *r, struct dn_kern_rta *rta, struct nlmsghdr *n, struct netlink_skb_parms *req)
> +static int dn_fib_table_insert(struct dn_fib_table *tb, struct rtmsg *r, struct nlattr *attrs[],
> +			       struct nlmsghdr *n, struct netlink_skb_parms *req)
>  {
>  	struct dn_hash *table = (struct dn_hash *)tb->data;
>  	struct dn_fib_node *new_f, *f, **fp, **del_fp;
> @@ -536,15 +541,14 @@ static int dn_fib_table_insert(struct dn_fib_table *tb, struct rtmsg *r, struct
>  		return -ENOBUFS;
>  
>  	dz_key_0(key);
> -	if (rta->rta_dst) {
> -		__le16 dst;
> -		memcpy(&dst, rta->rta_dst, 2);
> +	if (attrs[RTA_DST]) {
> +		__le16 dst = nla_get_le16(attrs[RTA_DST]);
>  		if (dst & ~DZ_MASK(dz))
>  			return -EINVAL;
>  		key = dz_key(dst, dz);
>  	}
>  
> -	if ((fi = dn_fib_create_info(r, rta, n, &err)) == NULL)
> +	if ((fi = dn_fib_create_info(r, attrs, n, &err)) == NULL)
>  		return err;
>  
>  	if (dz->dz_nent > (dz->dz_divisor << 2) &&
> @@ -654,7 +658,8 @@ out:
>  }
>  
> 
> -static int dn_fib_table_delete(struct dn_fib_table *tb, struct rtmsg *r, struct dn_kern_rta *rta, struct nlmsghdr *n, struct netlink_skb_parms *req)
> +static int dn_fib_table_delete(struct dn_fib_table *tb, struct rtmsg *r, struct nlattr *attrs[],
> +			       struct nlmsghdr *n, struct netlink_skb_parms *req)
>  {
>  	struct dn_hash *table = (struct dn_hash*)tb->data;
>  	struct dn_fib_node **fp, **del_fp, *f;
> @@ -671,9 +676,8 @@ static int dn_fib_table_delete(struct dn_fib_table *tb, struct rtmsg *r, struct
>  		return -ESRCH;
>  
>  	dz_key_0(key);
> -	if (rta->rta_dst) {
> -		__le16 dst;
> -		memcpy(&dst, rta->rta_dst, 2);
> +	if (attrs[RTA_DST]) {
> +		__le16 dst = nla_get_le16(attrs[RTA_DST]);
>  		if (dst & ~DZ_MASK(dz))
>  			return -EINVAL;
>  		key = dz_key(dst, dz);
> @@ -703,7 +707,7 @@ static int dn_fib_table_delete(struct dn_fib_table *tb, struct rtmsg *r, struct
>  				(r->rtm_scope == RT_SCOPE_NOWHERE || f->fn_scope == r->rtm_scope) &&
>  				(!r->rtm_protocol ||
>  					fi->fib_protocol == r->rtm_protocol) &&
> -				dn_fib_nh_match(r, n, rta, fi) == 0)
> +				dn_fib_nh_match(r, n, attrs, fi) == 0)
>  			del_fp = fp;
>  	}
>  

^ permalink raw reply

* Re: [PATCH net-next] net: fix psock_fanout selftest hash collision
From: Willem de Bruijn @ 2013-03-21 18:01 UTC (permalink / raw)
  To: David Miller; +Cc: Daniel Borkmann, netdev
In-Reply-To: <20130321.135641.1239176040549971599.davem@davemloft.net>

On Thu, Mar 21, 2013 at 1:56 PM, David Miller <davem@davemloft.net> wrote:
> From: David Miller <davem@davemloft.net>
> Date: Thu, 21 Mar 2013 13:49:14 -0400 (EDT)
>
>> From: Willem de Bruijn <willemb@google.com>
>> Date: Thu, 21 Mar 2013 13:27:51 -0400
>>
>>> @@ -182,7 +182,13 @@ static char *sock_fanout_open_ring(int fd)
>>>                 .tp_frame_nr   = RING_NUM_FRAMES,
>>>         };
>>>         char *ring;
>>> +       int val = TPACKET_V2;
>>>
>>> +       if (setsockopt(fd, SOL_PACKET, PACKET_VERSION, (void *) &val,
>>
>> This whole patch is whitespace damaged.

Sorry. I did not know that that happened with cut and paste into gmail.

> I hand applied this patch and it makes the test pass on sparc64.

Great. Thanks for testing.

> Please submit this formally, thanks.

Will follow up right away.

^ permalink raw reply

* Re: [PATCH net-next] net: fix psock_fanout selftest hash collision
From: Willem de Bruijn @ 2013-03-21 18:00 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Miller, netdev
In-Reply-To: <514B47A9.3080304@redhat.com>

On Thu, Mar 21, 2013 at 1:47 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 03/21/2013 06:27 PM, Willem de Bruijn wrote:
>>
>> On Thu, Mar 21, 2013 at 2:31 AM, Daniel Borkmann <dborkman@redhat.com>
>> wrote:
>>>
>>> On 03/21/2013 01:07 AM, Willem de Bruijn wrote:
>>>>
>>>> On Wed, Mar 20, 2013 at 1:59 PM, David Miller <davem@davemloft.net>
>>>> wrote:
>>>>>
>>>>>
>>>>> From: David Miller <davem@davemloft.net>
>>>>> Date: Wed, 20 Mar 2013 12:33:44 -0400 (EDT)
>>>>>
>>>>>> From: Willem de Bruijn <willemb@google.com>
>>>>>> Date: Wed, 20 Mar 2013 02:42:44 -0400
>>>>>>
>>>>>>> Fix flaky results with PACKET_FANOUT_HASH depending on whether the
>>>>>>> two flows hash into the same packet socket or not.
>>>>>>>
>>>>>>> Also adds tests for PACKET_FANOUT_LB and PACKET_FANOUT_CPU and
>>>>>>> replaces the counting method with a packet ring.
>>>>>>>
>>>>>>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>>>>>>
>>>>>>
>>>>>> Applied, thanks.  I'll retest on my sparc64 box later today.
>>>>>
>>>>>
>>>>> Unfortunately, it's still broken there:
>>>>
>>>>
>>>> This looks like a new problem. Now the counters all stay zero.
>>>>
>>>> I am looking into it. I have not been able to reproduce this on my
>>>> x86_64 so far, so just brought a sparc32 up in qemu. Had less luck
>>>> with sparc64, but impressive that it works at all. Come to think of
>>>> it, is this a 64-bit kernel with 32-bit userland? Perhaps that
>>>> affects packet ring memory layout.
>>>
>>>
>>>
>>> That can affect the ring buffer in case of TPACKET_V1, which is default
>>> if not specified otherwise. See Documentation/networking/packet_mmap.txt
>>> +514
>>
>>
>> Thanks, Daniel. In that case, the following should fix it.
>> Unfortunately, I don't have the hardware to verify, but it still
>> passes on my platforms. Let me know if you prefer it as a regular
>> patch instead of inline.
>
>
> I can only tell you about x86_64: [PASS],

Thanks for testing.

>  although two ERRORs:

That is due to a workaround for probing sockets whose rxhashes map
onto different packet sockets in the test. The current approach tries
up to N (is 5) times until we find a pair of connections that map onto
the different queues. These ERRORS are from a previous try that
failed. Warning would be a better name.

But the whole workaround is not very good, as it only reduces flaky
failures, not fix them. Worst case, if the hash is uniformly random,
then for two packet sockets there's a 1/2 chance on every try that the
two connections hit the same socket, so a 1/(2^N) chance that the test
gives up and fails. I haven't found a better solution yet to generate
a pair of sockets that always hash onto the different sockets.

>
> running psock_fanout test
> --------------------
> test: control single socket
> test: control multiple sockets
> test: datapath 0x0
> info: count=0,0, expect=0,0
> info: count=20,0, expect=15,5
> ERROR: incorrect queue lengths
> info: count=20,0, expect=20,5
>
> ERROR: incorrect queue lengths
> info: trying alternate ports (4)
> test: datapath 0x0
> [...]
> OK. All tests passed
> [PASS]

^ permalink raw reply

* Re: [PATCH net-next] net: fix psock_fanout selftest hash collision
From: David Miller @ 2013-03-21 17:56 UTC (permalink / raw)
  To: willemb; +Cc: dborkman, netdev
In-Reply-To: <20130321.134914.231367983776200354.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 21 Mar 2013 13:49:14 -0400 (EDT)

> From: Willem de Bruijn <willemb@google.com>
> Date: Thu, 21 Mar 2013 13:27:51 -0400
> 
>> @@ -182,7 +182,13 @@ static char *sock_fanout_open_ring(int fd)
>>                 .tp_frame_nr   = RING_NUM_FRAMES,
>>         };
>>         char *ring;
>> +       int val = TPACKET_V2;
>> 
>> +       if (setsockopt(fd, SOL_PACKET, PACKET_VERSION, (void *) &val,
> 
> This whole patch is whitespace damaged.

I hand applied this patch and it makes the test pass on sparc64.

Please submit this formally, thanks.

^ permalink raw reply

* Re: [PATCH 1/3] net: Clean up SCM_CREDENTIALS code
From: Andy Lutomirski @ 2013-03-21 17:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
In-Reply-To: <87k3p1z1iq.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

On Wed, Mar 20, 2013 at 11:54 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
>
>> I was curious whether the uids, gids, and pids passed around worked
>> correctly in the presence of multiple namespaces.  I gave up trying
>> to figure it out: there are two copies of the pid (one of which has
>> type u32, which is odd), a struct cred * (!), and a separate kuid
>> and kgid.  IOW, all of the relevant data is stored twice, and it's
>> unclear which copy is used when.
>>
>> I also wondered what prevented a SO_CREDENTIALS message from being
>> recieved when the credentials weren't filled out.  Answer: not very
>> much (and there have been serious security bugs here in the past).
>>
>> So just rewrite the thing to store a pid_t relative to the init pid
>> ns, a kuid, and a kgid, and to explicitly track whether the data is
>> filled out.
>>
>> I haven't played with the secid code.  I have no idea whether it has
>> similar problems.
>>
>> I haven't benchmarked this, but it should be a respectable speedup
>> in the cases where the credentials are in use.
>
> The basic principle of no longer passing the struct cred we can
> certainly do.
>
> I am less convinced about the struct pid, but arguably that is the
> proper approach.

I agree it's not pretty.  OTOH it's faster, simpler, and I don't see
any benefit of keeping an explicit struct pid reference.  With this
approach, the only way to attack the code is to get a pid to be reused
or to impersonate pid 0 and try to confuse something.  But the other
way has the same issue, just with a shorter race window.

>
> A patch that proclaims that you didn't understand what the code was
> doing but you changed it anyway, suggests there are subtle bugs
> in there that you overlooked.

I'll improve the changelog text.  After following the code around, I
now understand what was going on.

>
> Certainly killing NETLINK_CB(sbk).ssk is a bug.

As noted in my other email, I'll drop that patch entirely.

>
> I do think there is a lot of good stuff in here and if you break this up
> into smaller patches simpler patches, and keep an eye on the speed of
> sending things messages without credentials.  I am pretty certain you
> can cook up something that is mergable.

I'm not sure how to split it up more without making it messier.  Once
the data structure changes, most of the rest of the changes have to
come along.

In any case, I won't send out a new version until I get some comments
on the code (other than the ssk thing).

--Andy

^ permalink raw reply

* Re: [PATCH net-next] net: fix psock_fanout selftest hash collision
From: David Miller @ 2013-03-21 17:49 UTC (permalink / raw)
  To: willemb; +Cc: dborkman, netdev
In-Reply-To: <CA+FuTSfkxkbZ8N1Vr2NVmT-8Obd2sGAo2a_Wwv=U-4KPvnLPPA@mail.gmail.com>

From: Willem de Bruijn <willemb@google.com>
Date: Thu, 21 Mar 2013 13:27:51 -0400

> @@ -182,7 +182,13 @@ static char *sock_fanout_open_ring(int fd)
>                 .tp_frame_nr   = RING_NUM_FRAMES,
>         };
>         char *ring;
> +       int val = TPACKET_V2;
> 
> +       if (setsockopt(fd, SOL_PACKET, PACKET_VERSION, (void *) &val,

This whole patch is whitespace damaged.

^ permalink raw reply

* Re: [PATCH net-next] net: fix psock_fanout selftest hash collision
From: Daniel Borkmann @ 2013-03-21 17:47 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: David Miller, netdev
In-Reply-To: <CA+FuTSfkxkbZ8N1Vr2NVmT-8Obd2sGAo2a_Wwv=U-4KPvnLPPA@mail.gmail.com>

On 03/21/2013 06:27 PM, Willem de Bruijn wrote:
> On Thu, Mar 21, 2013 at 2:31 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> On 03/21/2013 01:07 AM, Willem de Bruijn wrote:
>>> On Wed, Mar 20, 2013 at 1:59 PM, David Miller <davem@davemloft.net> wrote:
>>>>
>>>> From: David Miller <davem@davemloft.net>
>>>> Date: Wed, 20 Mar 2013 12:33:44 -0400 (EDT)
>>>>
>>>>> From: Willem de Bruijn <willemb@google.com>
>>>>> Date: Wed, 20 Mar 2013 02:42:44 -0400
>>>>>
>>>>>> Fix flaky results with PACKET_FANOUT_HASH depending on whether the
>>>>>> two flows hash into the same packet socket or not.
>>>>>>
>>>>>> Also adds tests for PACKET_FANOUT_LB and PACKET_FANOUT_CPU and
>>>>>> replaces the counting method with a packet ring.
>>>>>>
>>>>>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>>>>>
>>>>> Applied, thanks.  I'll retest on my sparc64 box later today.
>>>>
>>>> Unfortunately, it's still broken there:
>>>
>>> This looks like a new problem. Now the counters all stay zero.
>>>
>>> I am looking into it. I have not been able to reproduce this on my
>>> x86_64 so far, so just brought a sparc32 up in qemu. Had less luck
>>> with sparc64, but impressive that it works at all. Come to think of
>>> it, is this a 64-bit kernel with 32-bit userland? Perhaps that
>>> affects packet ring memory layout.
>>
>>
>> That can affect the ring buffer in case of TPACKET_V1, which is default
>> if not specified otherwise. See Documentation/networking/packet_mmap.txt
>> +514
>
> Thanks, Daniel. In that case, the following should fix it.
> Unfortunately, I don't have the hardware to verify, but it still
> passes on my platforms. Let me know if you prefer it as a regular
> patch instead of inline.

I can only tell you about x86_64: [PASS], although two ERRORs:

running psock_fanout test
--------------------
test: control single socket
test: control multiple sockets
test: datapath 0x0
info: count=0,0, expect=0,0
info: count=20,0, expect=15,5
ERROR: incorrect queue lengths
info: count=20,0, expect=20,5
ERROR: incorrect queue lengths
info: trying alternate ports (4)
test: datapath 0x0
[...]
OK. All tests passed
[PASS]

^ permalink raw reply

* Re: [PATCH net-next] net: fix psock_fanout selftest hash collision
From: David Miller @ 2013-03-21 17:46 UTC (permalink / raw)
  To: willemb; +Cc: dborkman, netdev
In-Reply-To: <CA+FuTSfkxkbZ8N1Vr2NVmT-8Obd2sGAo2a_Wwv=U-4KPvnLPPA@mail.gmail.com>

From: Willem de Bruijn <willemb@google.com>
Date: Thu, 21 Mar 2013 13:27:51 -0400

> Unfortunately, I don't have the hardware to verify, but it still
> passes on my platforms.

I think you do, simply build your test program with "gcc -m32"
and run it on your x86_64 machine.

But in any event I'll test your change too.

^ permalink raw reply

* [PATCH net-next 2/2] rtnetlink: Remove passing of attributes into rtnl_doit functions
From: Thomas Graf @ 2013-03-21 17:45 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1363885020.git.tgraf@suug.ch>

With decnet converted, we can finally get rid of rta_buf and its
computations around it. It also gets rid of the minimal header
length verification since all message handlers do that explicitly
anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 include/net/rtnetlink.h |  2 +-
 net/bridge/br_mdb.c     |  4 +--
 net/can/gw.c            |  5 ++-
 net/core/fib_rules.c    |  4 +--
 net/core/neighbour.c    |  6 ++--
 net/core/rtnetlink.c    | 82 ++++++-------------------------------------------
 net/dcb/dcbnl.c         |  2 +-
 net/decnet/dn_dev.c     |  4 +--
 net/decnet/dn_fib.c     |  4 +--
 net/decnet/dn_route.c   |  2 +-
 net/ipv4/devinet.c      |  7 ++---
 net/ipv4/fib_frontend.c |  4 +--
 net/ipv4/route.c        |  2 +-
 net/ipv6/addrconf.c     | 10 +++---
 net/ipv6/addrlabel.c    |  6 ++--
 net/ipv6/route.c        |  6 ++--
 net/phonet/pn_netlink.c |  4 +--
 net/sched/act_api.c     |  2 +-
 net/sched/cls_api.c     |  2 +-
 net/sched/sch_api.c     |  6 ++--
 20 files changed, 47 insertions(+), 117 deletions(-)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 5a15fab..7026648 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -4,7 +4,7 @@
 #include <linux/rtnetlink.h>
 #include <net/netlink.h>
 
-typedef int (*rtnl_doit_func)(struct sk_buff *, struct nlmsghdr *, void *);
+typedef int (*rtnl_doit_func)(struct sk_buff *, struct nlmsghdr *);
 typedef int (*rtnl_dumpit_func)(struct sk_buff *, struct netlink_callback *);
 typedef u16 (*rtnl_calcit_func)(struct sk_buff *, struct nlmsghdr *);
 
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index ee79f3f..19942e3 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -382,7 +382,7 @@ static int __br_mdb_add(struct net *net, struct net_bridge *br,
 	return ret;
 }
 
-static int br_mdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int br_mdb_add(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct br_mdb_entry *entry;
@@ -458,7 +458,7 @@ unlock:
 	return err;
 }
 
-static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net_device *dev;
 	struct br_mdb_entry *entry;
diff --git a/net/can/gw.c b/net/can/gw.c
index 2d117dc..2dc619d 100644
--- a/net/can/gw.c
+++ b/net/can/gw.c
@@ -778,8 +778,7 @@ static int cgw_parse_attr(struct nlmsghdr *nlh, struct cf_mod *mod,
 	return 0;
 }
 
-static int cgw_create_job(struct sk_buff *skb,  struct nlmsghdr *nlh,
-			  void *arg)
+static int cgw_create_job(struct sk_buff *skb,  struct nlmsghdr *nlh)
 {
 	struct rtcanmsg *r;
 	struct cgw_job *gwj;
@@ -868,7 +867,7 @@ static void cgw_remove_all_jobs(void)
 	}
 }
 
-static int cgw_remove_job(struct sk_buff *skb,  struct nlmsghdr *nlh, void *arg)
+static int cgw_remove_job(struct sk_buff *skb,  struct nlmsghdr *nlh)
 {
 	struct cgw_job *gwj = NULL;
 	struct hlist_node *nx;
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 58a4ba2..d5a9f8e 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -266,7 +266,7 @@ errout:
 	return err;
 }
 
-static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
+static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct fib_rule_hdr *frh = nlmsg_data(nlh);
@@ -415,7 +415,7 @@ errout:
 	return err;
 }
 
-static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
+static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct fib_rule_hdr *frh = nlmsg_data(nlh);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 3863b8f..c72a646 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1613,7 +1613,7 @@ int neigh_table_clear(struct neigh_table *tbl)
 }
 EXPORT_SYMBOL(neigh_table_clear);
 
-static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ndmsg *ndm;
@@ -1677,7 +1677,7 @@ out:
 	return err;
 }
 
-static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ndmsg *ndm;
@@ -1955,7 +1955,7 @@ static const struct nla_policy nl_ntbl_parm_policy[NDTPA_MAX+1] = {
 	[NDTPA_LOCKTIME]		= { .type = NLA_U64 },
 };
 
-static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct neigh_table *tbl;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 9a9b99e..751f124 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -515,32 +515,6 @@ out:
 	return err;
 }
 
-static const int rtm_min[RTM_NR_FAMILIES] =
-{
-	[RTM_FAM(RTM_NEWLINK)]      = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
-	[RTM_FAM(RTM_NEWADDR)]      = NLMSG_LENGTH(sizeof(struct ifaddrmsg)),
-	[RTM_FAM(RTM_NEWROUTE)]     = NLMSG_LENGTH(sizeof(struct rtmsg)),
-	[RTM_FAM(RTM_NEWRULE)]      = NLMSG_LENGTH(sizeof(struct fib_rule_hdr)),
-	[RTM_FAM(RTM_NEWQDISC)]     = NLMSG_LENGTH(sizeof(struct tcmsg)),
-	[RTM_FAM(RTM_NEWTCLASS)]    = NLMSG_LENGTH(sizeof(struct tcmsg)),
-	[RTM_FAM(RTM_NEWTFILTER)]   = NLMSG_LENGTH(sizeof(struct tcmsg)),
-	[RTM_FAM(RTM_NEWACTION)]    = NLMSG_LENGTH(sizeof(struct tcamsg)),
-	[RTM_FAM(RTM_GETMULTICAST)] = NLMSG_LENGTH(sizeof(struct rtgenmsg)),
-	[RTM_FAM(RTM_GETANYCAST)]   = NLMSG_LENGTH(sizeof(struct rtgenmsg)),
-};
-
-static const int rta_max[RTM_NR_FAMILIES] =
-{
-	[RTM_FAM(RTM_NEWLINK)]      = IFLA_MAX,
-	[RTM_FAM(RTM_NEWADDR)]      = IFA_MAX,
-	[RTM_FAM(RTM_NEWROUTE)]     = RTA_MAX,
-	[RTM_FAM(RTM_NEWRULE)]      = FRA_MAX,
-	[RTM_FAM(RTM_NEWQDISC)]     = TCA_MAX,
-	[RTM_FAM(RTM_NEWTCLASS)]    = TCA_MAX,
-	[RTM_FAM(RTM_NEWTFILTER)]   = TCA_MAX,
-	[RTM_FAM(RTM_NEWACTION)]    = TCAA_MAX,
-};
-
 int rtnetlink_send(struct sk_buff *skb, struct net *net, u32 pid, unsigned int group, int echo)
 {
 	struct sock *rtnl = net->rtnl;
@@ -1537,7 +1511,7 @@ errout:
 	return err;
 }
 
-static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ifinfomsg *ifm;
@@ -1578,7 +1552,7 @@ errout:
 	return err;
 }
 
-static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	const struct rtnl_link_ops *ops;
@@ -1709,7 +1683,7 @@ static int rtnl_group_changelink(struct net *net, int group,
 	return 0;
 }
 
-static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	const struct rtnl_link_ops *ops;
@@ -1864,7 +1838,7 @@ out:
 	}
 }
 
-static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
+static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr* nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ifinfomsg *ifm;
@@ -2081,7 +2055,7 @@ int ndo_dflt_fdb_add(struct ndmsg *ndm,
 }
 EXPORT_SYMBOL(ndo_dflt_fdb_add);
 
-static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ndmsg *ndm;
@@ -2179,7 +2153,7 @@ int ndo_dflt_fdb_del(struct ndmsg *ndm,
 }
 EXPORT_SYMBOL(ndo_dflt_fdb_del);
 
-static int rtnl_fdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int rtnl_fdb_del(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ndmsg *ndm;
@@ -2478,8 +2452,7 @@ errout:
 	return err;
 }
 
-static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
-			       void *arg)
+static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ifinfomsg *ifm;
@@ -2549,8 +2522,7 @@ out:
 	return err;
 }
 
-static int rtnl_bridge_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
-			       void *arg)
+static int rtnl_bridge_dellink(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ifinfomsg *ifm;
@@ -2620,10 +2592,6 @@ out:
 	return err;
 }
 
-/* Protected by RTNL sempahore.  */
-static struct rtattr **rta_buf;
-static int rtattr_max;
-
 /* Process one rtnetlink message. */
 
 static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
@@ -2631,7 +2599,6 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	struct net *net = sock_net(skb->sk);
 	rtnl_doit_func doit;
 	int sz_idx, kind;
-	int min_len;
 	int family;
 	int type;
 	int err;
@@ -2679,32 +2646,11 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		return err;
 	}
 
-	memset(rta_buf, 0, (rtattr_max * sizeof(struct rtattr *)));
-
-	min_len = rtm_min[sz_idx];
-	if (nlh->nlmsg_len < min_len)
-		return -EINVAL;
-
-	if (nlh->nlmsg_len > min_len) {
-		int attrlen = nlh->nlmsg_len - NLMSG_ALIGN(min_len);
-		struct rtattr *attr = (void *)nlh + NLMSG_ALIGN(min_len);
-
-		while (RTA_OK(attr, attrlen)) {
-			unsigned int flavor = attr->rta_type & NLA_TYPE_MASK;
-			if (flavor) {
-				if (flavor > rta_max[sz_idx])
-					return -EINVAL;
-				rta_buf[flavor-1] = attr;
-			}
-			attr = RTA_NEXT(attr, attrlen);
-		}
-	}
-
 	doit = rtnl_get_doit(family, type);
 	if (doit == NULL)
 		return -EOPNOTSUPP;
 
-	return doit(skb, nlh, (void *)&rta_buf[0]);
+	return doit(skb, nlh);
 }
 
 static void rtnetlink_rcv(struct sk_buff *skb)
@@ -2774,16 +2720,6 @@ static struct pernet_operations rtnetlink_net_ops = {
 
 void __init rtnetlink_init(void)
 {
-	int i;
-
-	rtattr_max = 0;
-	for (i = 0; i < ARRAY_SIZE(rta_max); i++)
-		if (rta_max[i] > rtattr_max)
-			rtattr_max = rta_max[i];
-	rta_buf = kmalloc(rtattr_max * sizeof(struct rtattr *), GFP_KERNEL);
-	if (!rta_buf)
-		panic("rtnetlink_init: cannot allocate rta_buf\n");
-
 	if (register_pernet_subsys(&rtnetlink_net_ops))
 		panic("rtnetlink_init: cannot initialize rtnetlink\n");
 
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 21291f1..40d5829 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1658,7 +1658,7 @@ static const struct reply_func reply_funcs[DCB_CMD_MAX+1] = {
 	[DCB_CMD_CEE_GET]	= { RTM_GETDCB, dcbnl_cee_get },
 };
 
-static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct net_device *netdev;
diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index c8da116..7d91970 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -563,7 +563,7 @@ static const struct nla_policy dn_ifa_policy[IFA_MAX+1] = {
 				    .len = IFNAMSIZ - 1 },
 };
 
-static int dn_nl_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int dn_nl_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tb[IFA_MAX+1];
@@ -607,7 +607,7 @@ errout:
 	return err;
 }
 
-static int dn_nl_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int dn_nl_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tb[IFA_MAX+1];
diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index 42a8048..f093059 100644
--- a/net/decnet/dn_fib.c
+++ b/net/decnet/dn_fib.c
@@ -511,7 +511,7 @@ static inline u32 rtm_get_table(struct nlattr *attrs[], u8 table)
 	return table;
 }
 
-static int dn_fib_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int dn_fib_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct dn_fib_table *tb;
@@ -536,7 +536,7 @@ static int dn_fib_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *
 	return tb->delete(tb, r, attrs, nlh, &NETLINK_CB(skb));
 }
 
-static int dn_fib_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int dn_fib_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct dn_fib_table *tb;
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index b4b3508..5904429 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1616,7 +1616,7 @@ errout:
 /*
  * This is called by both endnodes and routers now.
  */
-static int dn_cache_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void *arg)
+static int dn_cache_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(in_skb->sk);
 	struct rtmsg *rtm = nlmsg_data(nlh);
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index af57bba..20a9f92 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -536,7 +536,7 @@ struct in_ifaddr *inet_ifa_byprefix(struct in_device *in_dev, __be32 prefix,
 	return NULL;
 }
 
-static int inet_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int inet_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tb[IFA_MAX+1];
@@ -775,7 +775,7 @@ static struct in_ifaddr *find_matching_ifa(struct in_ifaddr *ifa)
 	return NULL;
 }
 
-static int inet_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int inet_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct in_ifaddr *ifa;
@@ -1730,8 +1730,7 @@ static const struct nla_policy devconf_ipv4_policy[NETCONFA_MAX+1] = {
 };
 
 static int inet_netconf_get_devconf(struct sk_buff *in_skb,
-				    struct nlmsghdr *nlh,
-				    void *arg)
+				    struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(in_skb->sk);
 	struct nlattr *tb[NETCONFA_MAX+1];
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index eb4bb12..0e74398 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -604,7 +604,7 @@ errout:
 	return err;
 }
 
-static int inet_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int inet_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct fib_config cfg;
@@ -626,7 +626,7 @@ errout:
 	return err;
 }
 
-static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct fib_config cfg;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 6e28514..550781a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2311,7 +2311,7 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
-static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void *arg)
+static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(in_skb->sk);
 	struct rtmsg *rtm;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index fa36a67..15794fd 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -544,8 +544,7 @@ static const struct nla_policy devconf_ipv6_policy[NETCONFA_MAX+1] = {
 };
 
 static int inet6_netconf_get_devconf(struct sk_buff *in_skb,
-				     struct nlmsghdr *nlh,
-				     void *arg)
+				     struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(in_skb->sk);
 	struct nlattr *tb[NETCONFA_MAX+1];
@@ -3578,7 +3577,7 @@ static const struct nla_policy ifa_ipv6_policy[IFA_MAX+1] = {
 };
 
 static int
-inet6_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+inet6_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ifaddrmsg *ifm;
@@ -3644,7 +3643,7 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u8 ifa_flags,
 }
 
 static int
-inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ifaddrmsg *ifm;
@@ -3983,8 +3982,7 @@ static int inet6_dump_ifacaddr(struct sk_buff *skb, struct netlink_callback *cb)
 	return inet6_dump_addr(skb, cb, type);
 }
 
-static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
-			     void *arg)
+static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(in_skb->sk);
 	struct ifaddrmsg *ifm;
diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c
index 6f226c8..f083a58 100644
--- a/net/ipv6/addrlabel.c
+++ b/net/ipv6/addrlabel.c
@@ -414,8 +414,7 @@ static const struct nla_policy ifal_policy[IFAL_MAX+1] = {
 	[IFAL_LABEL]		= { .len = sizeof(u32), },
 };
 
-static int ip6addrlbl_newdel(struct sk_buff *skb, struct nlmsghdr *nlh,
-			     void *arg)
+static int ip6addrlbl_newdel(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ifaddrlblmsg *ifal;
@@ -530,8 +529,7 @@ static inline int ip6addrlbl_msgsize(void)
 		+ nla_total_size(4);	/* IFAL_LABEL */
 }
 
-static int ip6addrlbl_get(struct sk_buff *in_skb, struct nlmsghdr* nlh,
-			  void *arg)
+static int ip6addrlbl_get(struct sk_buff *in_skb, struct nlmsghdr* nlh)
 {
 	struct net *net = sock_net(in_skb->sk);
 	struct ifaddrlblmsg *ifal;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index e5fe004..ad0aa6b 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2355,7 +2355,7 @@ beginning:
 	return last_err;
 }
 
-static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
+static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh)
 {
 	struct fib6_config cfg;
 	int err;
@@ -2370,7 +2370,7 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
 		return ip6_route_del(&cfg);
 }
 
-static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
+static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh)
 {
 	struct fib6_config cfg;
 	int err;
@@ -2562,7 +2562,7 @@ int rt6_dump_route(struct rt6_info *rt, void *p_arg)
 		     prefix, 0, NLM_F_MULTI);
 }
 
-static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr* nlh, void *arg)
+static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr* nlh)
 {
 	struct net *net = sock_net(in_skb->sk);
 	struct nlattr *tb[RTA_MAX+1];
diff --git a/net/phonet/pn_netlink.c b/net/phonet/pn_netlink.c
index 0193630..dc15f43 100644
--- a/net/phonet/pn_netlink.c
+++ b/net/phonet/pn_netlink.c
@@ -61,7 +61,7 @@ static const struct nla_policy ifa_phonet_policy[IFA_MAX+1] = {
 	[IFA_LOCAL] = { .type = NLA_U8 },
 };
 
-static int addr_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *attr)
+static int addr_doit(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tb[IFA_MAX+1];
@@ -224,7 +224,7 @@ static const struct nla_policy rtm_phonet_policy[RTA_MAX+1] = {
 	[RTA_OIF] = { .type = NLA_U32 },
 };
 
-static int route_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *attr)
+static int route_doit(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tb[RTA_MAX+1];
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 8579c4b..fd70728 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -982,7 +982,7 @@ done:
 	return ret;
 }
 
-static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
+static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tca[TCA_ACT_MAX + 1];
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 964f5e4..9a04b98 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -118,7 +118,7 @@ static inline u32 tcf_auto_prio(struct tcf_proto *tp)
 
 /* Add/change/delete/get a filter node */
 
-static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
+static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tca[TCA_MAX + 1];
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index c297e2a..0bbce22 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -971,7 +971,7 @@ check_loop_fn(struct Qdisc *q, unsigned long cl, struct qdisc_walker *w)
  * Delete/get qdisc.
  */
 
-static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
+static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 {
 	struct net *net = sock_net(skb->sk);
 	struct tcmsg *tcm = nlmsg_data(n);
@@ -1038,7 +1038,7 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
  * Create/change qdisc.
  */
 
-static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
+static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n)
 {
 	struct net *net = sock_net(skb->sk);
 	struct tcmsg *tcm;
@@ -1372,7 +1372,7 @@ done:
 
 
 
-static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
+static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n)
 {
 	struct net *net = sock_net(skb->sk);
 	struct tcmsg *tcm = nlmsg_data(n);
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 1/2] decnet: Parse netlink attributes on our own
From: Thomas Graf @ 2013-03-21 17:45 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-decnet-user
In-Reply-To: <cover.1363885020.git.tgraf@suug.ch>

decnet is the only subsystem left that is relying on the global
netlink attribute buffer rta_buf. It's horrible design and we
want to get rid of it.

This converts all of decnet to do implicit attribute parsing. It
also gets rid of the error prone struct dn_kern_rta.

Yes, the fib_magic() stuff is not pretty.

It's compiled tested but I need someone with appropriate hardware
to test the patch since I don't have access to it.

Cc: linux-decnet-user@lists.sourceforge.net
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 include/net/dn_fib.h  |  28 ++-----
 net/decnet/dn_fib.c   | 211 +++++++++++++++++++++++++++-----------------------
 net/decnet/dn_route.c |  27 ++++---
 net/decnet/dn_table.c |  42 +++++-----
 4 files changed, 160 insertions(+), 148 deletions(-)

diff --git a/include/net/dn_fib.h b/include/net/dn_fib.h
index 1ee9d4b..74004af 100644
--- a/include/net/dn_fib.h
+++ b/include/net/dn_fib.h
@@ -1,24 +1,9 @@
 #ifndef _NET_DN_FIB_H
 #define _NET_DN_FIB_H
 
-/* WARNING: The ordering of these elements must match ordering
- *          of RTA_* rtnetlink attribute numbers.
- */
-struct dn_kern_rta {
-        void            *rta_dst;
-        void            *rta_src;
-        int             *rta_iif;
-        int             *rta_oif;
-        void            *rta_gw;
-        u32             *rta_priority;
-        void            *rta_prefsrc;
-        struct rtattr   *rta_mx;
-        struct rtattr   *rta_mp;
-        unsigned char   *rta_protoinfo;
-        u32             *rta_flow;
-        struct rta_cacheinfo *rta_ci;
-	struct rta_session *rta_sess;
-};
+#include <linux/netlink.h>
+
+extern const struct nla_policy rtm_dn_policy[];
 
 struct dn_fib_res {
 	struct fib_rule *r;
@@ -93,10 +78,10 @@ struct dn_fib_table {
 	u32 n;
 
 	int (*insert)(struct dn_fib_table *t, struct rtmsg *r, 
-			struct dn_kern_rta *rta, struct nlmsghdr *n, 
+			struct nlattr *attrs[], struct nlmsghdr *n,
 			struct netlink_skb_parms *req);
 	int (*delete)(struct dn_fib_table *t, struct rtmsg *r,
-			struct dn_kern_rta *rta, struct nlmsghdr *n,
+			struct nlattr *attrs[], struct nlmsghdr *n,
 			struct netlink_skb_parms *req);
 	int (*lookup)(struct dn_fib_table *t, const struct flowidn *fld,
 			struct dn_fib_res *res);
@@ -116,13 +101,12 @@ extern void dn_fib_cleanup(void);
 extern int dn_fib_ioctl(struct socket *sock, unsigned int cmd, 
 			unsigned long arg);
 extern struct dn_fib_info *dn_fib_create_info(const struct rtmsg *r, 
-				struct dn_kern_rta *rta, 
+				struct nlattr *attrs[],
 				const struct nlmsghdr *nlh, int *errp);
 extern int dn_fib_semantic_match(int type, struct dn_fib_info *fi, 
 			const struct flowidn *fld,
 			struct dn_fib_res *res);
 extern void dn_fib_release_info(struct dn_fib_info *fi);
-extern __le16 dn_fib_get_attr16(struct rtattr *attr, int attrlen, int type);
 extern void dn_fib_flush(void);
 extern void dn_fib_select_multipath(const struct flowidn *fld,
 					struct dn_fib_res *res);
diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index e36614e..42a8048 100644
--- a/net/decnet/dn_fib.c
+++ b/net/decnet/dn_fib.c
@@ -145,22 +145,10 @@ static inline struct dn_fib_info *dn_fib_find_info(const struct dn_fib_info *nfi
 	return NULL;
 }
 
-__le16 dn_fib_get_attr16(struct rtattr *attr, int attrlen, int type)
+static int dn_fib_count_nhs(const struct nlattr *attr)
 {
-	while(RTA_OK(attr,attrlen)) {
-		if (attr->rta_type == type)
-			return *(__le16*)RTA_DATA(attr);
-		attr = RTA_NEXT(attr, attrlen);
-	}
-
-	return 0;
-}
-
-static int dn_fib_count_nhs(struct rtattr *rta)
-{
-	int nhs = 0;
-	struct rtnexthop *nhp = RTA_DATA(rta);
-	int nhlen = RTA_PAYLOAD(rta);
+	struct rtnexthop *nhp = nla_data(attr);
+	int nhs = 0, nhlen = nla_len(attr);
 
 	while(nhlen >= (int)sizeof(struct rtnexthop)) {
 		if ((nhlen -= nhp->rtnh_len) < 0)
@@ -172,10 +160,11 @@ static int dn_fib_count_nhs(struct rtattr *rta)
 	return nhs;
 }
 
-static int dn_fib_get_nhs(struct dn_fib_info *fi, const struct rtattr *rta, const struct rtmsg *r)
+static int dn_fib_get_nhs(struct dn_fib_info *fi, const struct nlattr *attr,
+			  const struct rtmsg *r)
 {
-	struct rtnexthop *nhp = RTA_DATA(rta);
-	int nhlen = RTA_PAYLOAD(rta);
+	struct rtnexthop *nhp = nla_data(attr);
+	int nhlen = nla_len(attr);
 
 	change_nexthops(fi) {
 		int attrlen = nhlen - sizeof(struct rtnexthop);
@@ -187,7 +176,10 @@ static int dn_fib_get_nhs(struct dn_fib_info *fi, const struct rtattr *rta, cons
 		nh->nh_weight = nhp->rtnh_hops + 1;
 
 		if (attrlen) {
-			nh->nh_gw = dn_fib_get_attr16(RTNH_DATA(nhp), attrlen, RTA_GATEWAY);
+			struct nlattr *gw_attr;
+
+			gw_attr = nla_find((struct nlattr *) (nhp + 1), attrlen, RTA_GATEWAY);
+			nh->nh_gw = gw_attr ? nla_get_le16(gw_attr) : 0;
 		}
 		nhp = RTNH_NEXT(nhp);
 	} endfor_nexthops(fi);
@@ -268,7 +260,8 @@ out:
 }
 
 
-struct dn_fib_info *dn_fib_create_info(const struct rtmsg *r, struct dn_kern_rta *rta, const struct nlmsghdr *nlh, int *errp)
+struct dn_fib_info *dn_fib_create_info(const struct rtmsg *r, struct nlattr *attrs[],
+				       const struct nlmsghdr *nlh, int *errp)
 {
 	int err;
 	struct dn_fib_info *fi = NULL;
@@ -281,11 +274,9 @@ struct dn_fib_info *dn_fib_create_info(const struct rtmsg *r, struct dn_kern_rta
 	if (dn_fib_props[r->rtm_type].scope > r->rtm_scope)
 		goto err_inval;
 
-	if (rta->rta_mp) {
-		nhs = dn_fib_count_nhs(rta->rta_mp);
-		if (nhs == 0)
-			goto err_inval;
-	}
+	if (attrs[RTA_MULTIPATH] &&
+	    (nhs = dn_fib_count_nhs(attrs[RTA_MULTIPATH])) == 0)
+		goto err_inval;
 
 	fi = kzalloc(sizeof(*fi)+nhs*sizeof(struct dn_fib_nh), GFP_KERNEL);
 	err = -ENOBUFS;
@@ -295,53 +286,65 @@ struct dn_fib_info *dn_fib_create_info(const struct rtmsg *r, struct dn_kern_rta
 	fi->fib_protocol = r->rtm_protocol;
 	fi->fib_nhs = nhs;
 	fi->fib_flags = r->rtm_flags;
-	if (rta->rta_priority)
-		fi->fib_priority = *rta->rta_priority;
-	if (rta->rta_mx) {
-		int attrlen = RTA_PAYLOAD(rta->rta_mx);
-		struct rtattr *attr = RTA_DATA(rta->rta_mx);
 
-		while(RTA_OK(attr, attrlen)) {
-			unsigned int flavour = attr->rta_type;
+	if (attrs[RTA_PRIORITY])
+		fi->fib_priority = nla_get_u32(attrs[RTA_PRIORITY]);
+
+	if (attrs[RTA_METRICS]) {
+		struct nlattr *attr;
+		int rem;
 
-			if (flavour) {
-				if (flavour > RTAX_MAX)
+		nla_for_each_nested(attr, attrs[RTA_METRICS], rem) {
+			int type = nla_type(attr);
+
+			if (type) {
+				if (type > RTAX_MAX || nla_len(attr) < 4)
 					goto err_inval;
-				fi->fib_metrics[flavour-1] = *(unsigned int *)RTA_DATA(attr);
+
+				fi->fib_metrics[type-1] = nla_get_u32(attr);
 			}
-			attr = RTA_NEXT(attr, attrlen);
 		}
 	}
-	if (rta->rta_prefsrc)
-		memcpy(&fi->fib_prefsrc, rta->rta_prefsrc, 2);
 
-	if (rta->rta_mp) {
-		if ((err = dn_fib_get_nhs(fi, rta->rta_mp, r)) != 0)
+	if (attrs[RTA_PREFSRC])
+		fi->fib_prefsrc = nla_get_le16(attrs[RTA_PREFSRC]);
+
+	if (attrs[RTA_MULTIPATH]) {
+		if ((err = dn_fib_get_nhs(fi, attrs[RTA_MULTIPATH], r)) != 0)
 			goto failure;
-		if (rta->rta_oif && fi->fib_nh->nh_oif != *rta->rta_oif)
+
+		if (attrs[RTA_OIF] &&
+		    fi->fib_nh->nh_oif != nla_get_u32(attrs[RTA_OIF]))
 			goto err_inval;
-		if (rta->rta_gw && memcmp(&fi->fib_nh->nh_gw, rta->rta_gw, 2))
+
+		if (attrs[RTA_GATEWAY] &&
+		    fi->fib_nh->nh_gw != nla_get_le16(attrs[RTA_GATEWAY]))
 			goto err_inval;
 	} else {
 		struct dn_fib_nh *nh = fi->fib_nh;
-		if (rta->rta_oif)
-			nh->nh_oif = *rta->rta_oif;
-		if (rta->rta_gw)
-			memcpy(&nh->nh_gw, rta->rta_gw, 2);
+
+		if (attrs[RTA_OIF])
+			nh->nh_oif = nla_get_u32(attrs[RTA_OIF]);
+
+		if (attrs[RTA_GATEWAY])
+			nh->nh_gw = nla_get_le16(attrs[RTA_GATEWAY]);
+
 		nh->nh_flags = r->rtm_flags;
 		nh->nh_weight = 1;
 	}
 
 	if (r->rtm_type == RTN_NAT) {
-		if (rta->rta_gw == NULL || nhs != 1 || rta->rta_oif)
+		if (!attrs[RTA_GATEWAY] || nhs != 1 || attrs[RTA_OIF])
 			goto err_inval;
-		memcpy(&fi->fib_nh->nh_gw, rta->rta_gw, 2);
+
+		fi->fib_nh->nh_gw = nla_get_le16(attrs[RTA_GATEWAY]);
 		goto link_it;
 	}
 
 	if (dn_fib_props[r->rtm_type].error) {
-		if (rta->rta_gw || rta->rta_oif || rta->rta_mp)
+		if (attrs[RTA_GATEWAY] || attrs[RTA_OIF] || attrs[RTA_MULTIPATH])
 			goto err_inval;
+
 		goto link_it;
 	}
 
@@ -367,8 +370,8 @@ struct dn_fib_info *dn_fib_create_info(const struct rtmsg *r, struct dn_kern_rta
 	}
 
 	if (fi->fib_prefsrc) {
-		if (r->rtm_type != RTN_LOCAL || rta->rta_dst == NULL ||
-		    memcmp(&fi->fib_prefsrc, rta->rta_dst, 2))
+		if (r->rtm_type != RTN_LOCAL || !attrs[RTA_DST] ||
+		    fi->fib_prefsrc != nla_get_le16(attrs[RTA_DST]))
 			if (dnet_addr_type(fi->fib_prefsrc) != RTN_LOCAL)
 				goto err_inval;
 	}
@@ -486,29 +489,24 @@ void dn_fib_select_multipath(const struct flowidn *fld, struct dn_fib_res *res)
 	spin_unlock_bh(&dn_fib_multipath_lock);
 }
 
+const struct nla_policy rtm_dn_policy[RTA_MAX + 1] = {
+	[RTA_DST]		= { .type = NLA_U16 },
+	[RTA_SRC]		= { .type = NLA_U16 },
+	[RTA_IIF]		= { .type = NLA_U32 },
+	[RTA_OIF]		= { .type = NLA_U32 },
+	[RTA_GATEWAY]		= { .type = NLA_U16 },
+	[RTA_PRIORITY]		= { .type = NLA_U32 },
+	[RTA_PREFSRC]		= { .type = NLA_U16 },
+	[RTA_METRICS]		= { .type = NLA_NESTED },
+	[RTA_MULTIPATH]		= { .type = NLA_NESTED },
+	[RTA_TABLE]		= { .type = NLA_U32 },
+	[RTA_MARK]		= { .type = NLA_U32 },
+};
 
-static int dn_fib_check_attr(struct rtmsg *r, struct rtattr **rta)
-{
-	int i;
-
-	for(i = 1; i <= RTA_MAX; i++) {
-		struct rtattr *attr = rta[i-1];
-		if (attr) {
-			if (RTA_PAYLOAD(attr) < 4 && RTA_PAYLOAD(attr) != 2)
-				return -EINVAL;
-			if (i != RTA_MULTIPATH && i != RTA_METRICS &&
-			    i != RTA_TABLE)
-				rta[i-1] = (struct rtattr *)RTA_DATA(attr);
-		}
-	}
-
-	return 0;
-}
-
-static inline u32 rtm_get_table(struct rtattr **rta, u8 table)
+static inline u32 rtm_get_table(struct nlattr *attrs[], u8 table)
 {
-	if (rta[RTA_TABLE - 1])
-		table = nla_get_u32((struct nlattr *) rta[RTA_TABLE - 1]);
+	if (attrs[RTA_TABLE])
+		table = nla_get_u32(attrs[RTA_TABLE]);
 
 	return table;
 }
@@ -517,8 +515,9 @@ static int dn_fib_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *
 {
 	struct net *net = sock_net(skb->sk);
 	struct dn_fib_table *tb;
-	struct rtattr **rta = arg;
-	struct rtmsg *r = NLMSG_DATA(nlh);
+	struct rtmsg *r = nlmsg_data(nlh);
+	struct nlattr *attrs[RTA_MAX+1];
+	int err;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -526,22 +525,24 @@ static int dn_fib_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *
 	if (!net_eq(net, &init_net))
 		return -EINVAL;
 
-	if (dn_fib_check_attr(r, rta))
-		return -EINVAL;
+	err = nlmsg_parse(nlh, sizeof(*r), attrs, RTA_MAX, rtm_dn_policy);
+	if (err < 0)
+		return err;
 
-	tb = dn_fib_get_table(rtm_get_table(rta, r->rtm_table), 0);
-	if (tb)
-		return tb->delete(tb, r, (struct dn_kern_rta *)rta, nlh, &NETLINK_CB(skb));
+	tb = dn_fib_get_table(rtm_get_table(attrs, r->rtm_table), 0);
+	if (!tb)
+		return -ESRCH;
 
-	return -ESRCH;
+	return tb->delete(tb, r, attrs, nlh, &NETLINK_CB(skb));
 }
 
 static int dn_fib_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
 	struct net *net = sock_net(skb->sk);
 	struct dn_fib_table *tb;
-	struct rtattr **rta = arg;
-	struct rtmsg *r = NLMSG_DATA(nlh);
+	struct rtmsg *r = nlmsg_data(nlh);
+	struct nlattr *attrs[RTA_MAX+1];
+	int err;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -549,14 +550,15 @@ static int dn_fib_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *
 	if (!net_eq(net, &init_net))
 		return -EINVAL;
 
-	if (dn_fib_check_attr(r, rta))
-		return -EINVAL;
+	err = nlmsg_parse(nlh, sizeof(*r), attrs, RTA_MAX, rtm_dn_policy);
+	if (err < 0)
+		return err;
 
-	tb = dn_fib_get_table(rtm_get_table(rta, r->rtm_table), 1);
-	if (tb)
-		return tb->insert(tb, r, (struct dn_kern_rta *)rta, nlh, &NETLINK_CB(skb));
+	tb = dn_fib_get_table(rtm_get_table(attrs, r->rtm_table), 1);
+	if (!tb)
+		return -ENOBUFS;
 
-	return -ENOBUFS;
+	return tb->insert(tb, r, attrs, nlh, &NETLINK_CB(skb));
 }
 
 static void fib_magic(int cmd, int type, __le16 dst, int dst_len, struct dn_ifaddr *ifa)
@@ -566,10 +568,31 @@ static void fib_magic(int cmd, int type, __le16 dst, int dst_len, struct dn_ifad
 		struct nlmsghdr nlh;
 		struct rtmsg rtm;
 	} req;
-	struct dn_kern_rta rta;
+	struct {
+		struct nlattr hdr;
+		__le16 dst;
+	} dst_attr = {
+		.dst = dst,
+	};
+	struct {
+		struct nlattr hdr;
+		__le16 prefsrc;
+	} prefsrc_attr = {
+		.prefsrc = ifa->ifa_local,
+	};
+	struct {
+		struct nlattr hdr;
+		u32 oif;
+	} oif_attr = {
+		.oif = ifa->ifa_dev->dev->ifindex,
+	};
+	struct nlattr *attrs[RTA_MAX+1] = {
+		[RTA_DST] = (struct nlattr *) &dst_attr,
+		[RTA_PREFSRC] = (struct nlattr * ) &prefsrc_attr,
+		[RTA_OIF] = (struct nlattr *) &oif_attr,
+	};
 
 	memset(&req.rtm, 0, sizeof(req.rtm));
-	memset(&rta, 0, sizeof(rta));
 
 	if (type == RTN_UNICAST)
 		tb = dn_fib_get_table(RT_MIN_TABLE, 1);
@@ -591,14 +614,10 @@ static void fib_magic(int cmd, int type, __le16 dst, int dst_len, struct dn_ifad
 	req.rtm.rtm_scope = (type != RTN_LOCAL ? RT_SCOPE_LINK : RT_SCOPE_HOST);
 	req.rtm.rtm_type = type;
 
-	rta.rta_dst = &dst;
-	rta.rta_prefsrc = &ifa->ifa_local;
-	rta.rta_oif = &ifa->ifa_dev->dev->ifindex;
-
 	if (cmd == RTM_NEWROUTE)
-		tb->insert(tb, &req.rtm, &rta, &req.nlh, NULL);
+		tb->insert(tb, &req.rtm, attrs, &req.nlh, NULL);
 	else
-		tb->delete(tb, &req.rtm, &rta, &req.nlh, NULL);
+		tb->delete(tb, &req.rtm, attrs, &req.nlh, NULL);
 }
 
 static void dn_fib_add_ifaddr(struct dn_ifaddr *ifa)
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 5ac0e15..b4b3508 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1619,17 +1619,21 @@ errout:
 static int dn_cache_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void *arg)
 {
 	struct net *net = sock_net(in_skb->sk);
-	struct rtattr **rta = arg;
 	struct rtmsg *rtm = nlmsg_data(nlh);
 	struct dn_route *rt = NULL;
 	struct dn_skb_cb *cb;
 	int err;
 	struct sk_buff *skb;
 	struct flowidn fld;
+	struct nlattr *tb[RTA_MAX+1];
 
 	if (!net_eq(net, &init_net))
 		return -EINVAL;
 
+	err = nlmsg_parse(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_dn_policy);
+	if (err < 0)
+		return err;
+
 	memset(&fld, 0, sizeof(fld));
 	fld.flowidn_proto = DNPROTO_NSP;
 
@@ -1639,12 +1643,14 @@ static int dn_cache_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void
 	skb_reset_mac_header(skb);
 	cb = DN_SKB_CB(skb);
 
-	if (rta[RTA_SRC-1])
-		memcpy(&fld.saddr, RTA_DATA(rta[RTA_SRC-1]), 2);
-	if (rta[RTA_DST-1])
-		memcpy(&fld.daddr, RTA_DATA(rta[RTA_DST-1]), 2);
-	if (rta[RTA_IIF-1])
-		memcpy(&fld.flowidn_iif, RTA_DATA(rta[RTA_IIF-1]), sizeof(int));
+	if (tb[RTA_SRC])
+		fld.saddr = nla_get_le16(tb[RTA_SRC]);
+
+	if (tb[RTA_DST])
+		fld.daddr = nla_get_le16(tb[RTA_DST]);
+
+	if (tb[RTA_IIF])
+		fld.flowidn_iif = nla_get_u32(tb[RTA_IIF]);
 
 	if (fld.flowidn_iif) {
 		struct net_device *dev;
@@ -1669,10 +1675,9 @@ static int dn_cache_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void
 		if (!err && -rt->dst.error)
 			err = rt->dst.error;
 	} else {
-		int oif = 0;
-		if (rta[RTA_OIF - 1])
-			memcpy(&oif, RTA_DATA(rta[RTA_OIF - 1]), sizeof(int));
-		fld.flowidn_oif = oif;
+		if (tb[RTA_OIF])
+			fld.flowidn_oif = nla_get_u32(tb[RTA_OIF]);
+
 		err = dn_route_output_key((struct dst_entry **)&rt, &fld, 0);
 	}
 
diff --git a/net/decnet/dn_table.c b/net/decnet/dn_table.c
index 6c2445b..fc42a0a 100644
--- a/net/decnet/dn_table.c
+++ b/net/decnet/dn_table.c
@@ -224,26 +224,27 @@ static struct dn_zone *dn_new_zone(struct dn_hash *table, int z)
 }
 
 
-static int dn_fib_nh_match(struct rtmsg *r, struct nlmsghdr *nlh, struct dn_kern_rta *rta, struct dn_fib_info *fi)
+static int dn_fib_nh_match(struct rtmsg *r, struct nlmsghdr *nlh, struct nlattr *attrs[], struct dn_fib_info *fi)
 {
 	struct rtnexthop *nhp;
 	int nhlen;
 
-	if (rta->rta_priority && *rta->rta_priority != fi->fib_priority)
+	if (attrs[RTA_PRIORITY] &&
+	    nla_get_u32(attrs[RTA_PRIORITY]) != fi->fib_priority)
 		return 1;
 
-	if (rta->rta_oif || rta->rta_gw) {
-		if ((!rta->rta_oif || *rta->rta_oif == fi->fib_nh->nh_oif) &&
-		    (!rta->rta_gw  || memcmp(rta->rta_gw, &fi->fib_nh->nh_gw, 2) == 0))
+	if (attrs[RTA_OIF] || attrs[RTA_GATEWAY]) {
+		if ((!attrs[RTA_OIF] || nla_get_u32(attrs[RTA_OIF]) == fi->fib_nh->nh_oif) &&
+		    (!attrs[RTA_GATEWAY]  || nla_get_le16(attrs[RTA_GATEWAY]) != fi->fib_nh->nh_gw))
 			return 0;
 		return 1;
 	}
 
-	if (rta->rta_mp == NULL)
+	if (!attrs[RTA_MULTIPATH])
 		return 0;
 
-	nhp = RTA_DATA(rta->rta_mp);
-	nhlen = RTA_PAYLOAD(rta->rta_mp);
+	nhp = nla_data(attrs[RTA_MULTIPATH]);
+	nhlen = nla_len(attrs[RTA_MULTIPATH]);
 
 	for_nexthops(fi) {
 		int attrlen = nhlen - sizeof(struct rtnexthop);
@@ -254,7 +255,10 @@ static int dn_fib_nh_match(struct rtmsg *r, struct nlmsghdr *nlh, struct dn_kern
 		if (nhp->rtnh_ifindex && nhp->rtnh_ifindex != nh->nh_oif)
 			return 1;
 		if (attrlen) {
-			gw = dn_fib_get_attr16(RTNH_DATA(nhp), attrlen, RTA_GATEWAY);
+			struct nlattr *gw_attr;
+
+			gw_attr = nla_find((struct nlattr *) (nhp + 1), attrlen, RTA_GATEWAY);
+			gw = gw_attr ? nla_get_le16(gw_attr) : 0;
 
 			if (gw && gw != nh->nh_gw)
 				return 1;
@@ -517,7 +521,8 @@ out:
 	return skb->len;
 }
 
-static int dn_fib_table_insert(struct dn_fib_table *tb, struct rtmsg *r, struct dn_kern_rta *rta, struct nlmsghdr *n, struct netlink_skb_parms *req)
+static int dn_fib_table_insert(struct dn_fib_table *tb, struct rtmsg *r, struct nlattr *attrs[],
+			       struct nlmsghdr *n, struct netlink_skb_parms *req)
 {
 	struct dn_hash *table = (struct dn_hash *)tb->data;
 	struct dn_fib_node *new_f, *f, **fp, **del_fp;
@@ -536,15 +541,14 @@ static int dn_fib_table_insert(struct dn_fib_table *tb, struct rtmsg *r, struct
 		return -ENOBUFS;
 
 	dz_key_0(key);
-	if (rta->rta_dst) {
-		__le16 dst;
-		memcpy(&dst, rta->rta_dst, 2);
+	if (attrs[RTA_DST]) {
+		__le16 dst = nla_get_le16(attrs[RTA_DST]);
 		if (dst & ~DZ_MASK(dz))
 			return -EINVAL;
 		key = dz_key(dst, dz);
 	}
 
-	if ((fi = dn_fib_create_info(r, rta, n, &err)) == NULL)
+	if ((fi = dn_fib_create_info(r, attrs, n, &err)) == NULL)
 		return err;
 
 	if (dz->dz_nent > (dz->dz_divisor << 2) &&
@@ -654,7 +658,8 @@ out:
 }
 
 
-static int dn_fib_table_delete(struct dn_fib_table *tb, struct rtmsg *r, struct dn_kern_rta *rta, struct nlmsghdr *n, struct netlink_skb_parms *req)
+static int dn_fib_table_delete(struct dn_fib_table *tb, struct rtmsg *r, struct nlattr *attrs[],
+			       struct nlmsghdr *n, struct netlink_skb_parms *req)
 {
 	struct dn_hash *table = (struct dn_hash*)tb->data;
 	struct dn_fib_node **fp, **del_fp, *f;
@@ -671,9 +676,8 @@ static int dn_fib_table_delete(struct dn_fib_table *tb, struct rtmsg *r, struct
 		return -ESRCH;
 
 	dz_key_0(key);
-	if (rta->rta_dst) {
-		__le16 dst;
-		memcpy(&dst, rta->rta_dst, 2);
+	if (attrs[RTA_DST]) {
+		__le16 dst = nla_get_le16(attrs[RTA_DST]);
 		if (dst & ~DZ_MASK(dz))
 			return -EINVAL;
 		key = dz_key(dst, dz);
@@ -703,7 +707,7 @@ static int dn_fib_table_delete(struct dn_fib_table *tb, struct rtmsg *r, struct
 				(r->rtm_scope == RT_SCOPE_NOWHERE || f->fn_scope == r->rtm_scope) &&
 				(!r->rtm_protocol ||
 					fi->fib_protocol == r->rtm_protocol) &&
-				dn_fib_nh_match(r, n, rta, fi) == 0)
+				dn_fib_nh_match(r, n, attrs, fi) == 0)
 			del_fp = fp;
 	}
 
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 0/2] Get rid of attribute parsing in rtnetlink_rcv_msg()
From: Thomas Graf @ 2013-03-21 17:45 UTC (permalink / raw)
  To: davem; +Cc: netdev

We have been wanting to get rid of rta_buf for a while. It is error
prone as it requires subsystems, especially modules, that register a
rtnetlink mesage type to also modify rtnetlink.c and thus require
the recompilation of the kernel. DECnet is the only user left so after
converting it we can safely remove it

Thomas Graf (2):
  decnet: Parse netlink attributes on our own
  rtnetlink: Remove passing of attributes into rtnl_doit functions

 include/net/dn_fib.h    |  28 ++-----
 include/net/rtnetlink.h |   2 +-
 net/bridge/br_mdb.c     |   4 +-
 net/can/gw.c            |   5 +-
 net/core/fib_rules.c    |   4 +-
 net/core/neighbour.c    |   6 +-
 net/core/rtnetlink.c    |  82 ++----------------
 net/dcb/dcbnl.c         |   2 +-
 net/decnet/dn_dev.c     |   4 +-
 net/decnet/dn_fib.c     | 215 ++++++++++++++++++++++++++----------------------
 net/decnet/dn_route.c   |  29 ++++---
 net/decnet/dn_table.c   |  42 +++++-----
 net/ipv4/devinet.c      |   7 +-
 net/ipv4/fib_frontend.c |   4 +-
 net/ipv4/route.c        |   2 +-
 net/ipv6/addrconf.c     |  10 +--
 net/ipv6/addrlabel.c    |   6 +-
 net/ipv6/route.c        |   6 +-
 net/phonet/pn_netlink.c |   4 +-
 net/sched/act_api.c     |   2 +-
 net/sched/cls_api.c     |   2 +-
 net/sched/sch_api.c     |   6 +-
 22 files changed, 207 insertions(+), 265 deletions(-)

-- 
1.7.11.7

^ permalink raw reply

* Re: [PATCH] net: reduce net_rx_action() latency to 2 HZ
From: Eric Dumazet @ 2013-03-21 17:43 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: David Miller, netdev, stable, Willy Tarreau, Tom Herbert,
	Steven Rostedt
In-Reply-To: <514B429C.5070605@windriver.com>

On Thu, 2013-03-21 at 13:25 -0400, Paul Gortmaker wrote:

> That is also reasonably portable back to 2.6.34.  And it is more
> interesting too -- it will be interesting in a preempt_rt context
> too, once RT moves ahead off the current 3.6 baseline, which still
> has the old count-limit of 10 vs the new 2ms time limit.
> 
> RT (3.4 and 3.6 based) currently has this patch from Steven:
> http://git.kernel.org/cgit/linux/kernel/git/paulg/3.6-rt-patches.git/tree/net-tx-action-avoid-livelock-on-rt.patch

Interesting, as Google has an internal patch removing this trylock() as
well.

I think I should upstream it eventually ;)

commit 2f0a3f573b531dc57c268fd809dc65169edae369
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Dec 13 09:18:01 2012 -0800

    net-dev_xmit_hold_queues: fix a busy loop in net_tx_action
    
    Under load, net_tx_action() fails to acquire qdisc lock
    and reschedules qdisc in a never ending loop.
    
    The spin_trylock() has almost no chance to complete because
    of ticket spinlock and xmit_hold_queue holding the lock for long
    period of times.
    

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox