Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH RFC 1/6] skbuff: support per-page destructors in copy_ubufs
From: Ian Campbell @ 2012-05-11 10:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <1336726800.23818.33.camel@zakaz.uk.xensource.com>

On Fri, 2012-05-11 at 10:00 +0100, Ian Campbell wrote:
> I'm seeing copy_ubufs called in my remote NFS test, which I don't
> think I expected -- I'll investigate why this is happening today. 

It's tcp_transmit_skb which can (conditionally) call skb_clone
(backtrace below)

I suspect this means that the existing SKBTX_DEV_ZEROCOPY semantics are
a superset of what we need to consider for the destructor case. I'm
assuming here that the existing SKBTX_DEV_ZEROCOPY is copying aside
exactly the right amount and isn't conservatively coying more often than
necessary.

shinfo->tx_flags are pretty scarce -- can we afford a new one for this
usecase?

Or perhaps this is actually a function of the callsite not the of
individual skb and we want to have some concept of "deep" and "shallow"
clones combined with SKBTX_DEV_ZEROCOPY to decide when to copy_ubufs or
not? e.g. deep clone => always copy if SKBTX_DEV_ZEROCOPY and shallow
clone => only copy if SKBTX_DEV_ZEROCOPY && destructor_arg!=NULL
(neither copy if !SKBTX_DEV_ZEROCOPY).

Oh, I suppose that reintroduces the copy_ubufs under a (shallow) cloned
skb race if one of those skbs eventually finds itself in a situation
where a skb_frag_orphan is required doesn't it. Hrm :-/

Will have to have a think...

Ian.

[  109.680828] ------------[ cut here ]------------
[  109.685440] WARNING: at /local/scratch/ianc/devel/kernels/linux/include/linux/skbuff.h:1732 skb_clone+0xe6/0xf0()
[  109.695678] Hardware name:
[  109.699162] ORPHANING
[  109.701434] Modules linked in:
[  109.704495] Pid: 10, comm: kworker/0:1 Tainted: G        W    3.4.0-rc4-x86_64-native+ #186
[  109.712830] Call Trace:
[  109.715278]  [<ffffffff8107edfa>] warn_slowpath_common+0x7a/0xb0
[  109.721273]  [<ffffffff8107eed1>] warn_slowpath_fmt+0x41/0x50
[  109.727007]  [<ffffffff8170feea>] ? tcp_transmit_skb+0x9a/0x8f0
[  109.732914]  [<ffffffff8169b2d6>] skb_clone+0xe6/0xf0
[  109.737957]  [<ffffffff8170feea>] tcp_transmit_skb+0x9a/0x8f0
[  109.743694]  [<ffffffff81712d7a>] tcp_write_xmit+0x1ea/0x9c0
[  109.749343]  [<ffffffff8171357b>] tcp_push_one+0x2b/0x40
[  109.754648]  [<ffffffff81705b2b>] tcp_sendpage+0x64b/0x6d0
[  109.760126]  [<ffffffff8172785d>] inet_sendpage+0x4d/0xf0
[  109.765518]  [<ffffffff817afed7>] xs_sendpages+0x117/0x2a0
[  109.770996]  [<ffffffff817ad3f0>] ? xprt_reserve+0x2d0/0x2d0
[  109.776647]  [<ffffffff817b0178>] xs_tcp_send_request+0x58/0x110
[  109.782644]  [<ffffffff817ad5bb>] xprt_transmit+0x6b/0x2d0
[  109.788123]  [<ffffffff817aa9a0>] ? call_transmit_status+0xd0/0xd0
[  109.794293]  [<ffffffff817aab70>] call_transmit+0x1d0/0x290
[  109.799857]  [<ffffffff817aa9a0>] ? call_transmit_status+0xd0/0xd0
[  109.806029]  [<ffffffff817b3725>] __rpc_execute+0x65/0x260
[  109.811505]  [<ffffffff817b3920>] ? __rpc_execute+0x260/0x260
[  109.817241]  [<ffffffff817b3930>] rpc_async_schedule+0x10/0x20
[  109.823066]  [<ffffffff81098fff>] process_one_work+0x11f/0x460
[  109.828895]  [<ffffffff8109b0b3>] worker_thread+0x173/0x3f0
[  109.834459]  [<ffffffff8109af40>] ? manage_workers+0x210/0x210
[  109.840283]  [<ffffffff8109fa26>] kthread+0x96/0xa0
[  109.845179]  [<ffffffff81861654>] kernel_thread_helper+0x4/0x10
[  109.851092]  [<ffffffff8109f990>] ? kthread_freezable_should_stop+0x70/0x70
[  109.858053]  [<ffffffff81861650>] ? gs_change+0xb/0xb
[  109.863087] ---[ end trace 3e3acdb7cc57c191 ]---

^ permalink raw reply

* Re: qlge driver corrupting kernel memory
From: Mike Galbraith @ 2012-05-11 11:38 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo; +Cc: netdev
In-Reply-To: <20120508120748.GA3504@oc1711230544.ibm.com>

On Tue, 2012-05-08 at 09:07 -0300, Thadeu Lima de Souza Cascardo wrote: 
> On Tue, May 08, 2012 at 01:00:18PM +0200, Mike Galbraith wrote:
> > Greetings network wizards,
> > 
> > $subject is happening in an 2.6.32 enterprise kernel with the driver
> > updated to what looks to me to be 2.6.38 or so.
> > 
> > Allegedly, IFF boxen are running dual CNAs with storage and LAN sharing
> > a port, $subject happens fairly regularly.  Rummaging in crashdumps
> > seems to show corruption happens because we somehow end up stuffing
> > loads of frags into skb_shared_info, scribbling all over the place.
> > 
> > Before I proceed, what I know about skbs can be found here..
> > 
> >     http://vger.kernel.org/~davem/skb_data.html
> > 
> > ..and that's the sum and total ;-)
> > 
> > I guess the first thing I should ask is whether anyone has seen such
> > scribbling with this driver.  Known issue would be a case of happiness,
> > but I doubt that will be the case from searching, so onward.
> > 
> 
> Hi, Mike.
> 
> From what you describe, I suspect this is related to this fix:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=782428535e0819b5b7c9825cd3faa2ad37032a70
> 
> Please, apply and report if that works for you.

Nope, box exploded.  I haven't seen a dump yet, but expect it'll be more
of the same scribbling.

-Mike

^ permalink raw reply

* Re: [PATCH RFC 1/6] skbuff: support per-page destructors in copy_ubufs
From: Michael S. Tsirkin @ 2012-05-11 12:08 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <1336733892.23818.69.camel@zakaz.uk.xensource.com>

On Fri, May 11, 2012 at 11:58:12AM +0100, Ian Campbell wrote:
> On Fri, 2012-05-11 at 10:00 +0100, Ian Campbell wrote:
> > I'm seeing copy_ubufs called in my remote NFS test, which I don't
> > think I expected -- I'll investigate why this is happening today. 
> 
> It's tcp_transmit_skb which can (conditionally) call skb_clone
> (backtrace below)

Interesting. I didn't realise we clone skbs on data path:
tcp_write_xmit calls tcp_transmit_skb with clone_it flag.
Could someone comment on why we need to clone on good path
like this?

-- 
MST

^ permalink raw reply

* pull request: batman-adv 2012-05-11
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r

Hello David,

this is a fixed version of the pull request issued on 2012-05-09.

Comments introduced in this patchset are not following the net-tree
guidelines. Another patch changing all the already existing comments will follow
later. 

New exported functions follow the new name convention we discussed so far.
A patch renaming all the existing exported functions will follow.



Please let me know if there is any problem.
Thank you,
	Antonio


The following changes since commit 06a4c1c55dbe5d9f7a708e8f1a52fd2ac8e5874f:

  6lowpan: IPv6 link local address (2012-05-10 23:38:22 -0400)

are available in the git repository at:

  git://git.open-mesh.org/linux-merge.git tags/batman-adv-for-davem

for you to fetch changes up to 35c133a000d54b7e3fe81e8c8e4b8af5878ad6dd:

  batman-adv: add contributor name (2012-05-11 13:56:08 +0200)

----------------------------------------------------------------
Included changes:

* fix a little bug in the DHCP packet snooping introduced so far
* minor fixes and cleanups
* minor routing protocol API cleanups
* add a new contributor name to translation-table.{c,h}
* update copyright years in file headers
* minor improvement for the routing algorithm

----------------------------------------------------------------
Antonio Quartulli (3):
      batman-adv: fix wrong dhcp option list browsing
      batman-adv: update copyright years
      batman-adv: add contributor name

Linus Luessing (1):
      batman-adv: Adding hard_iface specific sysfs wrapper macros for UINT

Marek Lindner (11):
      batman-adv: introduce is_single_hop_neigh variable to increase readability
      batman-adv: introduce packet type handler array for incoming packets
      batman-adv: register batman ogm receive function during protocol init
      batman-adv: rename last_valid to last_seen
      batman-adv: replace HZ calculations with jiffies_to_msecs()
      batman-adv: split neigh_new function into generic and batman iv specific parts
      batman-adv: ignore protocol packets if the interface did not enable this protocol
      batman-adv: refactoring API: find generalized name for bat_ogm_update_mac callback
      batman-adv: rename sysfs macros to reflect the soft-interface dependency
      batman-adv: avoid temporary routing loops by being strict on forwarded OGMs
      batman-adv: fix checkpatch string complaint

 net/batman-adv/bat_debugfs.c           |    4 +-
 net/batman-adv/bat_iv_ogm.c            |  176 +++++++++++++++++++++-----------
 net/batman-adv/bat_sysfs.c             |  100 +++++++++++++-----
 net/batman-adv/bridge_loop_avoidance.c |    2 +-
 net/batman-adv/bridge_loop_avoidance.h |    2 +-
 net/batman-adv/gateway_client.c        |    6 +-
 net/batman-adv/hard-interface.c        |  117 +--------------------
 net/batman-adv/main.c                  |  124 +++++++++++++++++++++-
 net/batman-adv/main.h                  |    6 ++
 net/batman-adv/originator.c            |   51 +++++----
 net/batman-adv/originator.h            |    7 +-
 net/batman-adv/packet.h                |    1 +
 net/batman-adv/routing.c               |   22 ++--
 net/batman-adv/routing.h               |    4 +-
 net/batman-adv/send.c                  |    2 +-
 net/batman-adv/translation-table.c     |    2 +-
 net/batman-adv/translation-table.h     |    2 +-
 net/batman-adv/types.h                 |   17 ++-
 18 files changed, 380 insertions(+), 265 deletions(-)

^ permalink raw reply

* [PATCH 01/15] batman-adv: fix wrong dhcp option list browsing
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r
In-Reply-To: <1336738892-7401-1-git-send-email-ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>

In is_type_dhcprequest(), while parsing a DHCP message, if the entry we found in
the option list is neither a padding nor the dhcp-type, we have to ignore it and
jump as many bytes as its length + 1. The "+ 1" byte is given by the subtype
field itself that has to be jumped too.

Reported-by: Marek Lindner <lindner_marek-LWAfsSFWpa4@public.gmane.org>
Signed-off-by: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
---
 net/batman-adv/gateway_client.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/batman-adv/gateway_client.c b/net/batman-adv/gateway_client.c
index 6f9b9b7..47f7186 100644
--- a/net/batman-adv/gateway_client.c
+++ b/net/batman-adv/gateway_client.c
@@ -558,10 +558,10 @@ static bool is_type_dhcprequest(struct sk_buff *skb, int header_len)
 			p++;
 
 			/* ...and then we jump over the data */
-			if (pkt_len < *p)
+			if (pkt_len < 1 + (*p))
 				goto out;
-			pkt_len -= *p;
-			p += (*p);
+			pkt_len -= 1 + (*p);
+			p += 1 + (*p);
 		}
 	}
 out:
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 02/15] batman-adv: introduce is_single_hop_neigh variable to increase readability
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_iv_ogm.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 8b2db2e..cd8f473 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -480,7 +480,8 @@ static void bat_iv_ogm_queue_add(struct bat_priv *bat_priv,
 static void bat_iv_ogm_forward(struct orig_node *orig_node,
 			       const struct ethhdr *ethhdr,
 			       struct batman_ogm_packet *batman_ogm_packet,
-			       int directlink, struct hard_iface *if_incoming)
+			       bool is_single_hop_neigh,
+			       struct hard_iface *if_incoming)
 {
 	struct bat_priv *bat_priv = netdev_priv(if_incoming->soft_iface);
 	struct neigh_node *router;
@@ -533,7 +534,7 @@ static void bat_iv_ogm_forward(struct orig_node *orig_node,
 
 	/* switch of primaries first hop flag when forwarding */
 	batman_ogm_packet->flags &= ~PRIMARIES_FIRST_HOP;
-	if (directlink)
+	if (is_single_hop_neigh)
 		batman_ogm_packet->flags |= DIRECTLINK;
 	else
 		batman_ogm_packet->flags &= ~DIRECTLINK;
@@ -918,7 +919,8 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 	struct neigh_node *orig_neigh_router = NULL;
 	int has_directlink_flag;
 	int is_my_addr = 0, is_my_orig = 0, is_my_oldorig = 0;
-	int is_broadcast = 0, is_bidirectional, is_single_hop_neigh;
+	int is_broadcast = 0, is_bidirectional;
+	bool is_single_hop_neigh = false;
 	int is_duplicate;
 	uint32_t if_incoming_seqno;
 
@@ -942,8 +944,8 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 
 	has_directlink_flag = (batman_ogm_packet->flags & DIRECTLINK ? 1 : 0);
 
-	is_single_hop_neigh = (compare_eth(ethhdr->h_source,
-					   batman_ogm_packet->orig) ? 1 : 0);
+	if (compare_eth(ethhdr->h_source, batman_ogm_packet->orig))
+		is_single_hop_neigh = true;
 
 	bat_dbg(DBG_BATMAN, bat_priv,
 		"Received BATMAN packet via NB: %pM, IF: %s [%pM] (from OG: %pM, via prev OG: %pM, seqno %u, ttvn %u, crc %u, changes %u, td %d, TTL %d, V %d, IDF %d)\n",
@@ -1114,7 +1116,7 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 
 		/* mark direct link on incoming interface */
 		bat_iv_ogm_forward(orig_node, ethhdr, batman_ogm_packet,
-				   1, if_incoming);
+				   is_single_hop_neigh, if_incoming);
 
 		bat_dbg(DBG_BATMAN, bat_priv,
 			"Forwarding packet: rebroadcast neighbor packet with direct link flag\n");
@@ -1137,7 +1139,7 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 	bat_dbg(DBG_BATMAN, bat_priv,
 		"Forwarding packet: rebroadcast originator packet\n");
 	bat_iv_ogm_forward(orig_node, ethhdr, batman_ogm_packet,
-			   0, if_incoming);
+			   is_single_hop_neigh, if_incoming);
 
 out_neigh:
 	if ((orig_neigh_node) && (!is_single_hop_neigh))
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 03/15] batman-adv: introduce packet type handler array for incoming packets
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

The packet handler array replaces the growing switch statement, thus
dealing with incoming packets in a more efficient way. It also adds
to possibility to register packet handlers on the fly.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/hard-interface.c |  113 ------------------------------------
 net/batman-adv/main.c           |  121 +++++++++++++++++++++++++++++++++++++++
 net/batman-adv/main.h           |    6 ++
 3 files changed, 127 insertions(+), 113 deletions(-)

diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index 47c79d7..95f869c 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -32,12 +32,6 @@
 
 #include <linux/if_arp.h>
 
-
-static int batman_skb_recv(struct sk_buff *skb,
-			   struct net_device *dev,
-			   struct packet_type *ptype,
-			   struct net_device *orig_dev);
-
 void hardif_free_rcu(struct rcu_head *rcu)
 {
 	struct hard_iface *hard_iface;
@@ -551,113 +545,6 @@ out:
 	return NOTIFY_DONE;
 }
 
-/* incoming packets with the batman ethertype received on any active hard
- * interface */
-static int batman_skb_recv(struct sk_buff *skb, struct net_device *dev,
-			   struct packet_type *ptype,
-			   struct net_device *orig_dev)
-{
-	struct bat_priv *bat_priv;
-	struct batman_ogm_packet *batman_ogm_packet;
-	struct hard_iface *hard_iface;
-	int ret;
-
-	hard_iface = container_of(ptype, struct hard_iface, batman_adv_ptype);
-	skb = skb_share_check(skb, GFP_ATOMIC);
-
-	/* skb was released by skb_share_check() */
-	if (!skb)
-		goto err_out;
-
-	/* packet should hold at least type and version */
-	if (unlikely(!pskb_may_pull(skb, 2)))
-		goto err_free;
-
-	/* expect a valid ethernet header here. */
-	if (unlikely(skb->mac_len != ETH_HLEN || !skb_mac_header(skb)))
-		goto err_free;
-
-	if (!hard_iface->soft_iface)
-		goto err_free;
-
-	bat_priv = netdev_priv(hard_iface->soft_iface);
-
-	if (atomic_read(&bat_priv->mesh_state) != MESH_ACTIVE)
-		goto err_free;
-
-	/* discard frames on not active interfaces */
-	if (hard_iface->if_status != IF_ACTIVE)
-		goto err_free;
-
-	batman_ogm_packet = (struct batman_ogm_packet *)skb->data;
-
-	if (batman_ogm_packet->header.version != COMPAT_VERSION) {
-		bat_dbg(DBG_BATMAN, bat_priv,
-			"Drop packet: incompatible batman version (%i)\n",
-			batman_ogm_packet->header.version);
-		goto err_free;
-	}
-
-	/* all receive handlers return whether they received or reused
-	 * the supplied skb. if not, we have to free the skb. */
-
-	switch (batman_ogm_packet->header.packet_type) {
-		/* batman originator packet */
-	case BAT_IV_OGM:
-		ret = recv_bat_ogm_packet(skb, hard_iface);
-		break;
-
-		/* batman icmp packet */
-	case BAT_ICMP:
-		ret = recv_icmp_packet(skb, hard_iface);
-		break;
-
-		/* unicast packet */
-	case BAT_UNICAST:
-		ret = recv_unicast_packet(skb, hard_iface);
-		break;
-
-		/* fragmented unicast packet */
-	case BAT_UNICAST_FRAG:
-		ret = recv_ucast_frag_packet(skb, hard_iface);
-		break;
-
-		/* broadcast packet */
-	case BAT_BCAST:
-		ret = recv_bcast_packet(skb, hard_iface);
-		break;
-
-		/* vis packet */
-	case BAT_VIS:
-		ret = recv_vis_packet(skb, hard_iface);
-		break;
-		/* Translation table query (request or response) */
-	case BAT_TT_QUERY:
-		ret = recv_tt_query(skb, hard_iface);
-		break;
-		/* Roaming advertisement */
-	case BAT_ROAM_ADV:
-		ret = recv_roam_adv(skb, hard_iface);
-		break;
-	default:
-		ret = NET_RX_DROP;
-	}
-
-	if (ret == NET_RX_DROP)
-		kfree_skb(skb);
-
-	/* return NET_RX_SUCCESS in any case as we
-	 * most probably dropped the packet for
-	 * routing-logical reasons. */
-
-	return NET_RX_SUCCESS;
-
-err_free:
-	kfree_skb(skb);
-err_out:
-	return NET_RX_DROP;
-}
-
 /* This function returns true if the interface represented by ifindex is a
  * 802.11 wireless device */
 bool is_wifi_iface(int ifindex)
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index 7913272..d19b935 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -39,6 +39,7 @@
 /* List manipulations on hardif_list have to be rtnl_lock()'ed,
  * list traversals just rcu-locked */
 struct list_head hardif_list;
+static int (*recv_packet_handler[256])(struct sk_buff *, struct hard_iface *);
 char bat_routing_algo[20] = "BATMAN IV";
 static struct hlist_head bat_algo_list;
 
@@ -46,11 +47,15 @@ unsigned char broadcast_addr[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
 
 struct workqueue_struct *bat_event_workqueue;
 
+static void recv_handler_init(void);
+
 static int __init batman_init(void)
 {
 	INIT_LIST_HEAD(&hardif_list);
 	INIT_HLIST_HEAD(&bat_algo_list);
 
+	recv_handler_init();
+
 	bat_iv_init();
 
 	/* the name should not be longer than 10 chars - see
@@ -179,6 +184,122 @@ int is_my_mac(const uint8_t *addr)
 	return 0;
 }
 
+static int recv_unhandled_packet(struct sk_buff *skb,
+				 struct hard_iface *recv_if)
+{
+	return NET_RX_DROP;
+}
+
+/* incoming packets with the batman ethertype received on any active hard
+ * interface
+ */
+int batman_skb_recv(struct sk_buff *skb, struct net_device *dev,
+		    struct packet_type *ptype, struct net_device *orig_dev)
+{
+	struct bat_priv *bat_priv;
+	struct batman_ogm_packet *batman_ogm_packet;
+	struct hard_iface *hard_iface;
+	uint8_t idx;
+	int ret;
+
+	hard_iface = container_of(ptype, struct hard_iface, batman_adv_ptype);
+	skb = skb_share_check(skb, GFP_ATOMIC);
+
+	/* skb was released by skb_share_check() */
+	if (!skb)
+		goto err_out;
+
+	/* packet should hold at least type and version */
+	if (unlikely(!pskb_may_pull(skb, 2)))
+		goto err_free;
+
+	/* expect a valid ethernet header here. */
+	if (unlikely(skb->mac_len != ETH_HLEN || !skb_mac_header(skb)))
+		goto err_free;
+
+	if (!hard_iface->soft_iface)
+		goto err_free;
+
+	bat_priv = netdev_priv(hard_iface->soft_iface);
+
+	if (atomic_read(&bat_priv->mesh_state) != MESH_ACTIVE)
+		goto err_free;
+
+	/* discard frames on not active interfaces */
+	if (hard_iface->if_status != IF_ACTIVE)
+		goto err_free;
+
+	batman_ogm_packet = (struct batman_ogm_packet *)skb->data;
+
+	if (batman_ogm_packet->header.version != COMPAT_VERSION) {
+		bat_dbg(DBG_BATMAN, bat_priv,
+			"Drop packet: incompatible batman version (%i)\n",
+			batman_ogm_packet->header.version);
+		goto err_free;
+	}
+
+	/* all receive handlers return whether they received or reused
+	 * the supplied skb. if not, we have to free the skb.
+	 */
+	idx = batman_ogm_packet->header.packet_type;
+	ret = (*recv_packet_handler[idx])(skb, hard_iface);
+
+	if (ret == NET_RX_DROP)
+		kfree_skb(skb);
+
+	/* return NET_RX_SUCCESS in any case as we
+	 * most probably dropped the packet for
+	 * routing-logical reasons.
+	 */
+	return NET_RX_SUCCESS;
+
+err_free:
+	kfree_skb(skb);
+err_out:
+	return NET_RX_DROP;
+}
+
+static void recv_handler_init(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(recv_packet_handler); i++)
+		recv_packet_handler[i] = recv_unhandled_packet;
+
+	/* batman originator packet */
+	recv_packet_handler[BAT_IV_OGM] = recv_bat_ogm_packet;
+	/* batman icmp packet */
+	recv_packet_handler[BAT_ICMP] = recv_icmp_packet;
+	/* unicast packet */
+	recv_packet_handler[BAT_UNICAST] = recv_unicast_packet;
+	/* fragmented unicast packet */
+	recv_packet_handler[BAT_UNICAST_FRAG] = recv_ucast_frag_packet;
+	/* broadcast packet */
+	recv_packet_handler[BAT_BCAST] = recv_bcast_packet;
+	/* vis packet */
+	recv_packet_handler[BAT_VIS] = recv_vis_packet;
+	/* Translation table query (request or response) */
+	recv_packet_handler[BAT_TT_QUERY] = recv_tt_query;
+	/* Roaming advertisement */
+	recv_packet_handler[BAT_ROAM_ADV] = recv_roam_adv;
+}
+
+int recv_handler_register(uint8_t packet_type,
+			  int (*recv_handler)(struct sk_buff *,
+					      struct hard_iface *))
+{
+	if (recv_packet_handler[packet_type] != &recv_unhandled_packet)
+		return -EBUSY;
+
+	recv_packet_handler[packet_type] = recv_handler;
+	return 0;
+}
+
+void recv_handler_unregister(uint8_t packet_type)
+{
+	recv_packet_handler[packet_type] = recv_unhandled_packet;
+}
+
 static struct bat_algo_ops *bat_algo_get(char *name)
 {
 	struct bat_algo_ops *bat_algo_ops = NULL, *bat_algo_ops_tmp;
diff --git a/net/batman-adv/main.h b/net/batman-adv/main.h
index d9832ac..fd83acd 100644
--- a/net/batman-adv/main.h
+++ b/net/batman-adv/main.h
@@ -155,6 +155,12 @@ void mesh_free(struct net_device *soft_iface);
 void inc_module_count(void);
 void dec_module_count(void);
 int is_my_mac(const uint8_t *addr);
+int batman_skb_recv(struct sk_buff *skb, struct net_device *dev,
+		    struct packet_type *ptype, struct net_device *orig_dev);
+int recv_handler_register(uint8_t packet_type,
+			  int (*recv_handler)(struct sk_buff *,
+					      struct hard_iface *));
+void recv_handler_unregister(uint8_t packet_type);
 int bat_algo_register(struct bat_algo_ops *bat_algo_ops);
 int bat_algo_select(struct bat_priv *bat_priv, char *name);
 int bat_algo_seq_print_text(struct seq_file *seq, void *offset);
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 04/15] batman-adv: register batman ogm receive function during protocol init
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

The B.A.T.M.A.N. IV OGM receive function still was hard-coded although
it is a routing protocol specific function. This patch takes advantage
of the dynamic packet handler registration to remove the hard-coded
function calls.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_iv_ogm.c |   31 +++++++++++++++++++++++++++----
 net/batman-adv/main.c       |    5 +----
 net/batman-adv/routing.c    |   22 ++++++++++------------
 net/batman-adv/routing.h    |    4 +++-
 net/batman-adv/types.h      |    3 ---
 5 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index cd8f473..e0aaf8c 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -1155,13 +1155,18 @@ out:
 	orig_node_free_ref(orig_node);
 }
 
-static void bat_iv_ogm_receive(struct hard_iface *if_incoming,
-			       struct sk_buff *skb)
+static int bat_iv_ogm_receive(struct sk_buff *skb,
+			      struct hard_iface *if_incoming)
 {
 	struct batman_ogm_packet *batman_ogm_packet;
 	struct ethhdr *ethhdr;
 	int buff_pos = 0, packet_len;
 	unsigned char *tt_buff, *packet_buff;
+	bool ret;
+
+	ret = check_management_packet(skb, if_incoming, BATMAN_OGM_HLEN);
+	if (!ret)
+		return NET_RX_DROP;
 
 	packet_len = skb_headlen(skb);
 	ethhdr = (struct ethhdr *)skb_mac_header(skb);
@@ -1187,6 +1192,9 @@ static void bat_iv_ogm_receive(struct hard_iface *if_incoming,
 						(packet_buff + buff_pos);
 	} while (bat_iv_ogm_aggr_packet(buff_pos, packet_len,
 					batman_ogm_packet->tt_num_changes));
+
+	kfree_skb(skb);
+	return NET_RX_SUCCESS;
 }
 
 static struct bat_algo_ops batman_iv __read_mostly = {
@@ -1197,10 +1205,25 @@ static struct bat_algo_ops batman_iv __read_mostly = {
 	.bat_ogm_update_mac = bat_iv_ogm_update_mac,
 	.bat_ogm_schedule = bat_iv_ogm_schedule,
 	.bat_ogm_emit = bat_iv_ogm_emit,
-	.bat_ogm_receive = bat_iv_ogm_receive,
 };
 
 int __init bat_iv_init(void)
 {
-	return bat_algo_register(&batman_iv);
+	int ret;
+
+	/* batman originator packet */
+	ret = recv_handler_register(BAT_IV_OGM, bat_iv_ogm_receive);
+	if (ret < 0)
+		goto out;
+
+	ret = bat_algo_register(&batman_iv);
+	if (ret < 0)
+		goto handler_unregister;
+
+	goto out;
+
+handler_unregister:
+	recv_handler_unregister(BAT_IV_OGM);
+out:
+	return ret;
 }
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index d19b935..f80c447 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -266,8 +266,6 @@ static void recv_handler_init(void)
 	for (i = 0; i < ARRAY_SIZE(recv_packet_handler); i++)
 		recv_packet_handler[i] = recv_unhandled_packet;
 
-	/* batman originator packet */
-	recv_packet_handler[BAT_IV_OGM] = recv_bat_ogm_packet;
 	/* batman icmp packet */
 	recv_packet_handler[BAT_ICMP] = recv_icmp_packet;
 	/* unicast packet */
@@ -334,8 +332,7 @@ int bat_algo_register(struct bat_algo_ops *bat_algo_ops)
 	    !bat_algo_ops->bat_primary_iface_set ||
 	    !bat_algo_ops->bat_ogm_update_mac ||
 	    !bat_algo_ops->bat_ogm_schedule ||
-	    !bat_algo_ops->bat_ogm_emit ||
-	    !bat_algo_ops->bat_ogm_receive) {
+	    !bat_algo_ops->bat_ogm_emit) {
 		pr_info("Routing algo '%s' does not implement required ops\n",
 			bat_algo_ops->name);
 		goto out;
diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
index ff56086..7ed9d8f 100644
--- a/net/batman-adv/routing.c
+++ b/net/batman-adv/routing.c
@@ -248,37 +248,35 @@ int window_protected(struct bat_priv *bat_priv, int32_t seq_num_diff,
 	return 0;
 }
 
-int recv_bat_ogm_packet(struct sk_buff *skb, struct hard_iface *hard_iface)
+bool check_management_packet(struct sk_buff *skb,
+			     struct hard_iface *hard_iface,
+			     int header_len)
 {
-	struct bat_priv *bat_priv = netdev_priv(hard_iface->soft_iface);
 	struct ethhdr *ethhdr;
 
 	/* drop packet if it has not necessary minimum size */
-	if (unlikely(!pskb_may_pull(skb, BATMAN_OGM_HLEN)))
-		return NET_RX_DROP;
+	if (unlikely(!pskb_may_pull(skb, header_len)))
+		return false;
 
 	ethhdr = (struct ethhdr *)skb_mac_header(skb);
 
 	/* packet with broadcast indication but unicast recipient */
 	if (!is_broadcast_ether_addr(ethhdr->h_dest))
-		return NET_RX_DROP;
+		return false;
 
 	/* packet with broadcast sender address */
 	if (is_broadcast_ether_addr(ethhdr->h_source))
-		return NET_RX_DROP;
+		return false;
 
 	/* create a copy of the skb, if needed, to modify it. */
 	if (skb_cow(skb, 0) < 0)
-		return NET_RX_DROP;
+		return false;
 
 	/* keep skb linear */
 	if (skb_linearize(skb) < 0)
-		return NET_RX_DROP;
+		return false;
 
-	bat_priv->bat_algo_ops->bat_ogm_receive(hard_iface, skb);
-
-	kfree_skb(skb);
-	return NET_RX_SUCCESS;
+	return true;
 }
 
 static int recv_my_icmp_packet(struct bat_priv *bat_priv,
diff --git a/net/batman-adv/routing.h b/net/batman-adv/routing.h
index 3d729cb..d6bbbeb 100644
--- a/net/batman-adv/routing.h
+++ b/net/batman-adv/routing.h
@@ -23,6 +23,9 @@
 #define _NET_BATMAN_ADV_ROUTING_H_
 
 void slide_own_bcast_window(struct hard_iface *hard_iface);
+bool check_management_packet(struct sk_buff *skb,
+			     struct hard_iface *hard_iface,
+			     int header_len);
 void update_route(struct bat_priv *bat_priv, struct orig_node *orig_node,
 		  struct neigh_node *neigh_node);
 int recv_icmp_packet(struct sk_buff *skb, struct hard_iface *recv_if);
@@ -30,7 +33,6 @@ int recv_unicast_packet(struct sk_buff *skb, struct hard_iface *recv_if);
 int recv_ucast_frag_packet(struct sk_buff *skb, struct hard_iface *recv_if);
 int recv_bcast_packet(struct sk_buff *skb, struct hard_iface *recv_if);
 int recv_vis_packet(struct sk_buff *skb, struct hard_iface *recv_if);
-int recv_bat_ogm_packet(struct sk_buff *skb, struct hard_iface *recv_if);
 int recv_tt_query(struct sk_buff *skb, struct hard_iface *recv_if);
 int recv_roam_adv(struct sk_buff *skb, struct hard_iface *recv_if);
 struct neigh_node *find_router(struct bat_priv *bat_priv,
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
index 2f4848b..50e1895 100644
--- a/net/batman-adv/types.h
+++ b/net/batman-adv/types.h
@@ -390,9 +390,6 @@ struct bat_algo_ops {
 				 int tt_num_changes);
 	/* send scheduled OGM */
 	void (*bat_ogm_emit)(struct forw_packet *forw_packet);
-	/* receive incoming OGM */
-	void (*bat_ogm_receive)(struct hard_iface *if_incoming,
-				struct sk_buff *skb);
 };
 
 #endif /* _NET_BATMAN_ADV_TYPES_H_ */
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 05/15] batman-adv: rename last_valid to last_seen
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_iv_ogm.c |    8 ++++----
 net/batman-adv/originator.c |   16 ++++++++--------
 net/batman-adv/types.h      |    8 ++++----
 3 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index e0aaf8c..8652a75 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -651,7 +651,7 @@ static void bat_iv_ogm_orig_update(struct bat_priv *bat_priv,
 	rcu_read_unlock();
 
 	orig_node->flags = batman_ogm_packet->flags;
-	neigh_node->last_valid = jiffies;
+	neigh_node->last_seen = jiffies;
 
 	spin_lock_bh(&neigh_node->tq_lock);
 	ring_buffer_set(neigh_node->tq_recv,
@@ -772,11 +772,11 @@ static int bat_iv_ogm_calc_tq(struct orig_node *orig_node,
 	if (!neigh_node)
 		goto out;
 
-	/* if orig_node is direct neighbor update neigh_node last_valid */
+	/* if orig_node is direct neighbor update neigh_node last_seen */
 	if (orig_node == orig_neigh_node)
-		neigh_node->last_valid = jiffies;
+		neigh_node->last_seen = jiffies;
 
-	orig_node->last_valid = jiffies;
+	orig_node->last_seen = jiffies;
 
 	/* find packet count of corresponding one hop neighbor */
 	spin_lock_bh(&orig_node->ogm_cnt_lock);
diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index ce49698..21c1f83 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -283,7 +283,7 @@ static bool purge_orig_neighbors(struct bat_priv *bat_priv,
 	hlist_for_each_entry_safe(neigh_node, node, node_tmp,
 				  &orig_node->neigh_list, list) {
 
-		if ((has_timed_out(neigh_node->last_valid, PURGE_TIMEOUT)) ||
+		if ((has_timed_out(neigh_node->last_seen, PURGE_TIMEOUT)) ||
 		    (neigh_node->if_incoming->if_status == IF_INACTIVE) ||
 		    (neigh_node->if_incoming->if_status == IF_NOT_IN_USE) ||
 		    (neigh_node->if_incoming->if_status == IF_TO_BE_REMOVED)) {
@@ -300,9 +300,9 @@ static bool purge_orig_neighbors(struct bat_priv *bat_priv,
 					neigh_node->if_incoming->net_dev->name);
 			else
 				bat_dbg(DBG_BATMAN, bat_priv,
-					"neighbor timeout: originator %pM, neighbor: %pM, last_valid: %lu\n",
+					"neighbor timeout: originator %pM, neighbor: %pM, last_seen: %lu\n",
 					orig_node->orig, neigh_node->addr,
-					(neigh_node->last_valid / HZ));
+					(neigh_node->last_seen / HZ));
 
 			neigh_purged = true;
 
@@ -325,10 +325,10 @@ static bool purge_orig_node(struct bat_priv *bat_priv,
 {
 	struct neigh_node *best_neigh_node;
 
-	if (has_timed_out(orig_node->last_valid, 2 * PURGE_TIMEOUT)) {
+	if (has_timed_out(orig_node->last_seen, 2 * PURGE_TIMEOUT)) {
 		bat_dbg(DBG_BATMAN, bat_priv,
-			"Originator timeout: originator %pM, last_valid %lu\n",
-			orig_node->orig, (orig_node->last_valid / HZ));
+			"Originator timeout: originator %pM, last_seen %lu\n",
+			orig_node->orig, (orig_node->last_seen / HZ));
 		return true;
 	} else {
 		if (purge_orig_neighbors(bat_priv, orig_node,
@@ -446,9 +446,9 @@ int orig_seq_print_text(struct seq_file *seq, void *offset)
 				goto next;
 
 			last_seen_secs = jiffies_to_msecs(jiffies -
-						orig_node->last_valid) / 1000;
+						orig_node->last_seen) / 1000;
 			last_seen_msecs = jiffies_to_msecs(jiffies -
-						orig_node->last_valid) % 1000;
+						orig_node->last_seen) % 1000;
 
 			seq_printf(seq, "%pM %4i.%03is   (%3i) %pM [%10s]:",
 				   orig_node->orig, last_seen_secs,
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
index 50e1895..9fa8b73 100644
--- a/net/batman-adv/types.h
+++ b/net/batman-adv/types.h
@@ -52,7 +52,7 @@ struct hard_iface {
 /**
  *	orig_node - structure for orig_list maintaining nodes of mesh
  *	@primary_addr: hosts primary interface address
- *	@last_valid: when last packet from this node was received
+ *	@last_seen: when last packet from this node was received
  *	@bcast_seqno_reset: time when the broadcast seqno window was reset
  *	@batman_seqno_reset: time when the batman seqno window was reset
  *	@gw_flags: flags related to gateway class
@@ -70,7 +70,7 @@ struct orig_node {
 	struct neigh_node __rcu *router; /* rcu protected pointer */
 	unsigned long *bcast_own;
 	uint8_t *bcast_own_sum;
-	unsigned long last_valid;
+	unsigned long last_seen;
 	unsigned long bcast_seqno_reset;
 	unsigned long batman_seqno_reset;
 	uint8_t gw_flags;
@@ -120,7 +120,7 @@ struct gw_node {
 
 /**
  *	neigh_node
- *	@last_valid: when last packet via this neighbor was received
+ *	@last_seen: when last packet via this neighbor was received
  */
 struct neigh_node {
 	struct hlist_node list;
@@ -131,7 +131,7 @@ struct neigh_node {
 	uint8_t tq_avg;
 	uint8_t last_ttl;
 	struct list_head bonding_list;
-	unsigned long last_valid;
+	unsigned long last_seen;
 	DECLARE_BITMAP(real_bits, TQ_LOCAL_WINDOW_SIZE);
 	atomic_t refcount;
 	struct rcu_head rcu;
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 06/15] batman-adv: replace HZ calculations with jiffies_to_msecs()
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_debugfs.c |    4 ++--
 net/batman-adv/originator.c  |   15 ++++++++++-----
 net/batman-adv/send.c        |    2 +-
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/net/batman-adv/bat_debugfs.c b/net/batman-adv/bat_debugfs.c
index 916380c..3b588f8 100644
--- a/net/batman-adv/bat_debugfs.c
+++ b/net/batman-adv/bat_debugfs.c
@@ -83,8 +83,8 @@ int debug_log(struct bat_priv *bat_priv, const char *fmt, ...)
 
 	va_start(args, fmt);
 	vscnprintf(tmp_log_buf, sizeof(tmp_log_buf), fmt, args);
-	fdebug_log(bat_priv->debug_log, "[%10lu] %s",
-		   (jiffies / HZ), tmp_log_buf);
+	fdebug_log(bat_priv->debug_log, "[%10u] %s",
+		   jiffies_to_msecs(jiffies), tmp_log_buf);
 	va_end(args);
 
 	return 0;
diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index 21c1f83..962636b 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -35,7 +35,8 @@ static void purge_orig(struct work_struct *work);
 static void start_purge_timer(struct bat_priv *bat_priv)
 {
 	INIT_DELAYED_WORK(&bat_priv->orig_work, purge_orig);
-	queue_delayed_work(bat_event_workqueue, &bat_priv->orig_work, 1 * HZ);
+	queue_delayed_work(bat_event_workqueue,
+			   &bat_priv->orig_work, msecs_to_jiffies(1000));
 }
 
 /* returns 1 if they are the same originator */
@@ -274,6 +275,7 @@ static bool purge_orig_neighbors(struct bat_priv *bat_priv,
 	struct hlist_node *node, *node_tmp;
 	struct neigh_node *neigh_node;
 	bool neigh_purged = false;
+	unsigned long last_seen;
 
 	*best_neigh_node = NULL;
 
@@ -288,6 +290,8 @@ static bool purge_orig_neighbors(struct bat_priv *bat_priv,
 		    (neigh_node->if_incoming->if_status == IF_NOT_IN_USE) ||
 		    (neigh_node->if_incoming->if_status == IF_TO_BE_REMOVED)) {
 
+			last_seen = neigh_node->last_seen;
+
 			if ((neigh_node->if_incoming->if_status ==
 								IF_INACTIVE) ||
 			    (neigh_node->if_incoming->if_status ==
@@ -300,9 +304,9 @@ static bool purge_orig_neighbors(struct bat_priv *bat_priv,
 					neigh_node->if_incoming->net_dev->name);
 			else
 				bat_dbg(DBG_BATMAN, bat_priv,
-					"neighbor timeout: originator %pM, neighbor: %pM, last_seen: %lu\n",
+					"neighbor timeout: originator %pM, neighbor: %pM, last_seen: %u\n",
 					orig_node->orig, neigh_node->addr,
-					(neigh_node->last_seen / HZ));
+					jiffies_to_msecs(last_seen));
 
 			neigh_purged = true;
 
@@ -327,8 +331,9 @@ static bool purge_orig_node(struct bat_priv *bat_priv,
 
 	if (has_timed_out(orig_node->last_seen, 2 * PURGE_TIMEOUT)) {
 		bat_dbg(DBG_BATMAN, bat_priv,
-			"Originator timeout: originator %pM, last_seen %lu\n",
-			orig_node->orig, (orig_node->last_seen / HZ));
+			"Originator timeout: originator %pM, last_seen %u\n",
+			orig_node->orig,
+			jiffies_to_msecs(orig_node->last_seen));
 		return true;
 	} else {
 		if (purge_orig_neighbors(bat_priv, orig_node,
diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c
index 7c66b61..8e74d97 100644
--- a/net/batman-adv/send.c
+++ b/net/batman-adv/send.c
@@ -292,7 +292,7 @@ static void send_outstanding_bcast_packet(struct work_struct *work)
 	/* if we still have some more bcasts to send */
 	if (forw_packet->num_packets < 3) {
 		_add_bcast_packet_to_list(bat_priv, forw_packet,
-					  ((5 * HZ) / 1000));
+					  msecs_to_jiffies(5));
 		return;
 	}
 
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 07/15] batman-adv: split neigh_new function into generic and batman iv specific parts
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_iv_ogm.c |   40 ++++++++++++++++++++++++++++++++++------
 net/batman-adv/originator.c |   28 +++++++++++-----------------
 net/batman-adv/originator.h |    7 +++----
 3 files changed, 48 insertions(+), 27 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 8652a75..ae0a08c 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -30,6 +30,32 @@
 #include "send.h"
 #include "bat_algo.h"
 
+static struct neigh_node *bat_iv_ogm_neigh_new(struct hard_iface *hard_iface,
+					       const uint8_t *neigh_addr,
+					       struct orig_node *orig_node,
+					       struct orig_node *orig_neigh,
+					       uint32_t seqno)
+{
+	struct neigh_node *neigh_node;
+
+	neigh_node = batadv_neigh_node_new(hard_iface, neigh_addr, seqno);
+	if (!neigh_node)
+		goto out;
+
+	INIT_LIST_HEAD(&neigh_node->bonding_list);
+	spin_lock_init(&neigh_node->tq_lock);
+
+	neigh_node->orig_node = orig_neigh;
+	neigh_node->if_incoming = hard_iface;
+
+	spin_lock_bh(&orig_node->neigh_list_lock);
+	hlist_add_head_rcu(&neigh_node->list, &orig_node->neigh_list);
+	spin_unlock_bh(&orig_node->neigh_list_lock);
+
+out:
+	return neigh_node;
+}
+
 static int bat_iv_ogm_iface_enable(struct hard_iface *hard_iface)
 {
 	struct batman_ogm_packet *batman_ogm_packet;
@@ -638,8 +664,9 @@ static void bat_iv_ogm_orig_update(struct bat_priv *bat_priv,
 		if (!orig_tmp)
 			goto unlock;
 
-		neigh_node = create_neighbor(orig_node, orig_tmp,
-					     ethhdr->h_source, if_incoming);
+		neigh_node = bat_iv_ogm_neigh_new(if_incoming, ethhdr->h_source,
+						  orig_node, orig_tmp,
+						  batman_ogm_packet->seqno);
 
 		orig_node_free_ref(orig_tmp);
 		if (!neigh_node)
@@ -764,10 +791,11 @@ static int bat_iv_ogm_calc_tq(struct orig_node *orig_node,
 	rcu_read_unlock();
 
 	if (!neigh_node)
-		neigh_node = create_neighbor(orig_neigh_node,
-					     orig_neigh_node,
-					     orig_neigh_node->orig,
-					     if_incoming);
+		neigh_node = bat_iv_ogm_neigh_new(if_incoming,
+						  orig_neigh_node->orig,
+						  orig_neigh_node,
+						  orig_neigh_node,
+						  batman_ogm_packet->seqno);
 
 	if (!neigh_node)
 		goto out;
diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index 962636b..f4b6201 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -85,35 +85,29 @@ struct neigh_node *orig_node_get_router(struct orig_node *orig_node)
 	return router;
 }
 
-struct neigh_node *create_neighbor(struct orig_node *orig_node,
-				   struct orig_node *orig_neigh_node,
-				   const uint8_t *neigh,
-				   struct hard_iface *if_incoming)
+struct neigh_node *batadv_neigh_node_new(struct hard_iface *hard_iface,
+					 const uint8_t *neigh_addr,
+					 uint32_t seqno)
 {
-	struct bat_priv *bat_priv = netdev_priv(if_incoming->soft_iface);
+	struct bat_priv *bat_priv = netdev_priv(hard_iface->soft_iface);
 	struct neigh_node *neigh_node;
 
-	bat_dbg(DBG_BATMAN, bat_priv,
-		"Creating new last-hop neighbor of originator\n");
-
 	neigh_node = kzalloc(sizeof(*neigh_node), GFP_ATOMIC);
 	if (!neigh_node)
-		return NULL;
+		goto out;
 
 	INIT_HLIST_NODE(&neigh_node->list);
-	INIT_LIST_HEAD(&neigh_node->bonding_list);
-	spin_lock_init(&neigh_node->tq_lock);
 
-	memcpy(neigh_node->addr, neigh, ETH_ALEN);
-	neigh_node->orig_node = orig_neigh_node;
-	neigh_node->if_incoming = if_incoming;
+	memcpy(neigh_node->addr, neigh_addr, ETH_ALEN);
 
 	/* extra reference for return */
 	atomic_set(&neigh_node->refcount, 2);
 
-	spin_lock_bh(&orig_node->neigh_list_lock);
-	hlist_add_head_rcu(&neigh_node->list, &orig_node->neigh_list);
-	spin_unlock_bh(&orig_node->neigh_list_lock);
+	bat_dbg(DBG_BATMAN, bat_priv,
+		"Creating new neighbor %pM, initial seqno %d\n",
+		neigh_addr, seqno);
+
+out:
 	return neigh_node;
 }
 
diff --git a/net/batman-adv/originator.h b/net/batman-adv/originator.h
index 3fe2eda..f74d0d6 100644
--- a/net/batman-adv/originator.h
+++ b/net/batman-adv/originator.h
@@ -29,10 +29,9 @@ void originator_free(struct bat_priv *bat_priv);
 void purge_orig_ref(struct bat_priv *bat_priv);
 void orig_node_free_ref(struct orig_node *orig_node);
 struct orig_node *get_orig_node(struct bat_priv *bat_priv, const uint8_t *addr);
-struct neigh_node *create_neighbor(struct orig_node *orig_node,
-				   struct orig_node *orig_neigh_node,
-				   const uint8_t *neigh,
-				   struct hard_iface *if_incoming);
+struct neigh_node *batadv_neigh_node_new(struct hard_iface *hard_iface,
+					 const uint8_t *neigh_addr,
+					 uint32_t seqno);
 void neigh_node_free_ref(struct neigh_node *neigh_node);
 struct neigh_node *orig_node_get_router(struct orig_node *orig_node);
 int orig_seq_print_text(struct seq_file *seq, void *offset);
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 08/15] batman-adv: ignore protocol packets if the interface did not enable this protocol
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

Reported-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_iv_ogm.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index ae0a08c..37e368d 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -1186,6 +1186,7 @@ out:
 static int bat_iv_ogm_receive(struct sk_buff *skb,
 			      struct hard_iface *if_incoming)
 {
+	struct bat_priv *bat_priv = netdev_priv(if_incoming->soft_iface);
 	struct batman_ogm_packet *batman_ogm_packet;
 	struct ethhdr *ethhdr;
 	int buff_pos = 0, packet_len;
@@ -1196,6 +1197,12 @@ static int bat_iv_ogm_receive(struct sk_buff *skb,
 	if (!ret)
 		return NET_RX_DROP;
 
+	/* did we receive a B.A.T.M.A.N. IV OGM packet on an interface
+	 * that does not have B.A.T.M.A.N. IV enabled ?
+	 */
+	if (bat_priv->bat_algo_ops->bat_ogm_emit != bat_iv_ogm_emit)
+		return NET_RX_DROP;
+
 	packet_len = skb_headlen(skb);
 	ethhdr = (struct ethhdr *)skb_mac_header(skb);
 	packet_buff = skb->data;
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 09/15] batman-adv: refactoring API: find generalized name for bat_ogm_update_mac callback
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_iv_ogm.c     |   24 ++++++++++++------------
 net/batman-adv/hard-interface.c |    4 ++--
 net/batman-adv/main.c           |    2 +-
 net/batman-adv/types.h          |    6 ++++--
 4 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 37e368d..9074e74 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -93,6 +93,17 @@ static void bat_iv_ogm_iface_disable(struct hard_iface *hard_iface)
 	hard_iface->packet_buff = NULL;
 }
 
+static void bat_iv_ogm_iface_update_mac(struct hard_iface *hard_iface)
+{
+	struct batman_ogm_packet *batman_ogm_packet;
+
+	batman_ogm_packet = (struct batman_ogm_packet *)hard_iface->packet_buff;
+	memcpy(batman_ogm_packet->orig,
+	       hard_iface->net_dev->dev_addr, ETH_ALEN);
+	memcpy(batman_ogm_packet->prev_sender,
+	       hard_iface->net_dev->dev_addr, ETH_ALEN);
+}
+
 static void bat_iv_ogm_primary_iface_set(struct hard_iface *hard_iface)
 {
 	struct batman_ogm_packet *batman_ogm_packet;
@@ -102,17 +113,6 @@ static void bat_iv_ogm_primary_iface_set(struct hard_iface *hard_iface)
 	batman_ogm_packet->header.ttl = TTL;
 }
 
-static void bat_iv_ogm_update_mac(struct hard_iface *hard_iface)
-{
-	struct batman_ogm_packet *batman_ogm_packet;
-
-	batman_ogm_packet = (struct batman_ogm_packet *)hard_iface->packet_buff;
-	memcpy(batman_ogm_packet->orig,
-	       hard_iface->net_dev->dev_addr, ETH_ALEN);
-	memcpy(batman_ogm_packet->prev_sender,
-	       hard_iface->net_dev->dev_addr, ETH_ALEN);
-}
-
 /* when do we schedule our own ogm to be sent */
 static unsigned long bat_iv_ogm_emit_send_time(const struct bat_priv *bat_priv)
 {
@@ -1236,8 +1236,8 @@ static struct bat_algo_ops batman_iv __read_mostly = {
 	.name = "BATMAN IV",
 	.bat_iface_enable = bat_iv_ogm_iface_enable,
 	.bat_iface_disable = bat_iv_ogm_iface_disable,
+	.bat_iface_update_mac = bat_iv_ogm_iface_update_mac,
 	.bat_primary_iface_set = bat_iv_ogm_primary_iface_set,
-	.bat_ogm_update_mac = bat_iv_ogm_update_mac,
 	.bat_ogm_schedule = bat_iv_ogm_schedule,
 	.bat_ogm_emit = bat_iv_ogm_emit,
 };
diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index 95f869c..0b84bb1 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -228,7 +228,7 @@ static void hardif_activate_interface(struct hard_iface *hard_iface)
 
 	bat_priv = netdev_priv(hard_iface->soft_iface);
 
-	bat_priv->bat_algo_ops->bat_ogm_update_mac(hard_iface);
+	bat_priv->bat_algo_ops->bat_iface_update_mac(hard_iface);
 	hard_iface->if_status = IF_TO_BE_ACTIVATED;
 
 	/**
@@ -524,7 +524,7 @@ static int hard_if_event(struct notifier_block *this,
 		check_known_mac_addr(hard_iface->net_dev);
 
 		bat_priv = netdev_priv(hard_iface->soft_iface);
-		bat_priv->bat_algo_ops->bat_ogm_update_mac(hard_iface);
+		bat_priv->bat_algo_ops->bat_iface_update_mac(hard_iface);
 
 		primary_if = primary_if_get_selected(bat_priv);
 		if (!primary_if)
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index f80c447..083a299 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -329,8 +329,8 @@ int bat_algo_register(struct bat_algo_ops *bat_algo_ops)
 	/* all algorithms must implement all ops (for now) */
 	if (!bat_algo_ops->bat_iface_enable ||
 	    !bat_algo_ops->bat_iface_disable ||
+	    !bat_algo_ops->bat_iface_update_mac ||
 	    !bat_algo_ops->bat_primary_iface_set ||
-	    !bat_algo_ops->bat_ogm_update_mac ||
 	    !bat_algo_ops->bat_ogm_schedule ||
 	    !bat_algo_ops->bat_ogm_emit) {
 		pr_info("Routing algo '%s' does not implement required ops\n",
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
index 9fa8b73..66a3750 100644
--- a/net/batman-adv/types.h
+++ b/net/batman-adv/types.h
@@ -381,10 +381,12 @@ struct bat_algo_ops {
 	int (*bat_iface_enable)(struct hard_iface *hard_iface);
 	/* de-init routing info when hard-interface is disabled */
 	void (*bat_iface_disable)(struct hard_iface *hard_iface);
+	/* (re-)init mac addresses of the protocol information
+	 * belonging to this hard-interface
+	 */
+	void (*bat_iface_update_mac)(struct hard_iface *hard_iface);
 	/* called when primary interface is selected / changed */
 	void (*bat_primary_iface_set)(struct hard_iface *hard_iface);
-	/* init mac addresses of the OGM belonging to this hard-interface */
-	void (*bat_ogm_update_mac)(struct hard_iface *hard_iface);
 	/* prepare a new outgoing OGM for the send queue */
 	void (*bat_ogm_schedule)(struct hard_iface *hard_iface,
 				 int tt_num_changes);
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 14/15] batman-adv: update copyright years
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

update copyright years in order to include 2012

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bridge_loop_avoidance.c |    2 +-
 net/batman-adv/bridge_loop_avoidance.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c
index ad394c6..8bf9751 100644
--- a/net/batman-adv/bridge_loop_avoidance.c
+++ b/net/batman-adv/bridge_loop_avoidance.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2011 B.A.T.M.A.N. contributors:
+ * Copyright (C) 2011-2012 B.A.T.M.A.N. contributors:
  *
  * Simon Wunderlich
  *
diff --git a/net/batman-adv/bridge_loop_avoidance.h b/net/batman-adv/bridge_loop_avoidance.h
index 4a8e4fc..e39f93a 100644
--- a/net/batman-adv/bridge_loop_avoidance.h
+++ b/net/batman-adv/bridge_loop_avoidance.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2011 B.A.T.M.A.N. contributors:
+ * Copyright (C) 2011-2012 B.A.T.M.A.N. contributors:
  *
  * Simon Wunderlich
  *
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 12/15] batman-adv: avoid temporary routing loops by being strict on forwarded OGMs
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

batman-adv would forward OGMs from non-besthops while replacing the the TQ
and TTL values with the values from the best hop. In certain corner cases
this leads to a temporary routing loop.
This patch changes this behavior: Only packets from best next hops are
forwarded - TQ and TTL values won't be replaced anymore. However, the protocol
needs to rebroadcast OGMs from single hop neighbors regardless of whether or
not they are the best hop. To handle this case a new flag is introduced to
alert neighboring nodes about the forwarded OGM that is not from my best
next hop. It is to be discarded by all nodes except for the one originating
the OGM.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Daniele Furlan <daniele.furlan@gmail.com>
Tested-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_iv_ogm.c |   60 ++++++++++++++++++++++---------------------
 net/batman-adv/packet.h     |    1 +
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 9074e74..bafb473 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -507,11 +507,10 @@ static void bat_iv_ogm_forward(struct orig_node *orig_node,
 			       const struct ethhdr *ethhdr,
 			       struct batman_ogm_packet *batman_ogm_packet,
 			       bool is_single_hop_neigh,
+			       bool is_from_best_next_hop,
 			       struct hard_iface *if_incoming)
 {
 	struct bat_priv *bat_priv = netdev_priv(if_incoming->soft_iface);
-	struct neigh_node *router;
-	uint8_t in_tq, in_ttl, tq_avg = 0;
 	uint8_t tt_num_changes;
 
 	if (batman_ogm_packet->header.ttl <= 1) {
@@ -519,41 +518,30 @@ static void bat_iv_ogm_forward(struct orig_node *orig_node,
 		return;
 	}
 
-	router = orig_node_get_router(orig_node);
+	if (!is_from_best_next_hop) {
+		/* Mark the forwarded packet when it is not coming from our
+		 * best next hop. We still need to forward the packet for our
+		 * neighbor link quality detection to work in case the packet
+		 * originated from a single hop neighbor. Otherwise we can
+		 * simply drop the ogm.
+		 */
+		if (is_single_hop_neigh)
+			batman_ogm_packet->flags |= NOT_BEST_NEXT_HOP;
+		else
+			return;
+	}
 
-	in_tq = batman_ogm_packet->tq;
-	in_ttl = batman_ogm_packet->header.ttl;
 	tt_num_changes = batman_ogm_packet->tt_num_changes;
 
 	batman_ogm_packet->header.ttl--;
 	memcpy(batman_ogm_packet->prev_sender, ethhdr->h_source, ETH_ALEN);
 
-	/* rebroadcast tq of our best ranking neighbor to ensure the rebroadcast
-	 * of our best tq value */
-	if (router && router->tq_avg != 0) {
-
-		/* rebroadcast ogm of best ranking neighbor as is */
-		if (!compare_eth(router->addr, ethhdr->h_source)) {
-			batman_ogm_packet->tq = router->tq_avg;
-
-			if (router->last_ttl)
-				batman_ogm_packet->header.ttl =
-					router->last_ttl - 1;
-		}
-
-		tq_avg = router->tq_avg;
-	}
-
-	if (router)
-		neigh_node_free_ref(router);
-
 	/* apply hop penalty */
 	batman_ogm_packet->tq = hop_penalty(batman_ogm_packet->tq, bat_priv);
 
 	bat_dbg(DBG_BATMAN, bat_priv,
-		"Forwarding packet: tq_orig: %i, tq_avg: %i, tq_forw: %i, ttl_orig: %i, ttl_forw: %i\n",
-		in_tq, tq_avg, batman_ogm_packet->tq, in_ttl - 1,
-		batman_ogm_packet->header.ttl);
+		"Forwarding packet: tq: %i, ttl: %i\n",
+		batman_ogm_packet->tq, batman_ogm_packet->header.ttl);
 
 	batman_ogm_packet->seqno = htonl(batman_ogm_packet->seqno);
 	batman_ogm_packet->tt_crc = htons(batman_ogm_packet->tt_crc);
@@ -949,6 +937,7 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 	int is_my_addr = 0, is_my_orig = 0, is_my_oldorig = 0;
 	int is_broadcast = 0, is_bidirectional;
 	bool is_single_hop_neigh = false;
+	bool is_from_best_next_hop = false;
 	int is_duplicate;
 	uint32_t if_incoming_seqno;
 
@@ -1070,6 +1059,13 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 		return;
 	}
 
+	if (batman_ogm_packet->flags & NOT_BEST_NEXT_HOP) {
+		bat_dbg(DBG_BATMAN, bat_priv,
+			"Drop packet: ignoring all packets not forwarded from "
+			"the best next hop (sender: %pM)\n", ethhdr->h_source);
+		return;
+	}
+
 	orig_node = get_orig_node(bat_priv, batman_ogm_packet->orig);
 	if (!orig_node)
 		return;
@@ -1094,6 +1090,10 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 	if (router)
 		router_router = orig_node_get_router(router->orig_node);
 
+	if ((router && router->tq_avg != 0) &&
+	    (compare_eth(router->addr, ethhdr->h_source)))
+		is_from_best_next_hop = true;
+
 	/* avoid temporary routing loops */
 	if (router && router_router &&
 	    (compare_eth(router->addr, batman_ogm_packet->prev_sender)) &&
@@ -1144,7 +1144,8 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 
 		/* mark direct link on incoming interface */
 		bat_iv_ogm_forward(orig_node, ethhdr, batman_ogm_packet,
-				   is_single_hop_neigh, if_incoming);
+				   is_single_hop_neigh, is_from_best_next_hop,
+				   if_incoming);
 
 		bat_dbg(DBG_BATMAN, bat_priv,
 			"Forwarding packet: rebroadcast neighbor packet with direct link flag\n");
@@ -1167,7 +1168,8 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 	bat_dbg(DBG_BATMAN, bat_priv,
 		"Forwarding packet: rebroadcast originator packet\n");
 	bat_iv_ogm_forward(orig_node, ethhdr, batman_ogm_packet,
-			   is_single_hop_neigh, if_incoming);
+			   is_single_hop_neigh, is_from_best_next_hop,
+			   if_incoming);
 
 out_neigh:
 	if ((orig_neigh_node) && (!is_single_hop_neigh))
diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h
index f54969c..0ee1af7 100644
--- a/net/batman-adv/packet.h
+++ b/net/batman-adv/packet.h
@@ -39,6 +39,7 @@ enum bat_packettype {
 #define COMPAT_VERSION 14
 
 enum batman_iv_flags {
+	NOT_BEST_NEXT_HOP   = 1 << 3,
 	PRIMARIES_FIRST_HOP = 1 << 4,
 	VIS_SERVER	    = 1 << 5,
 	DIRECTLINK	    = 1 << 6
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 13/15] batman-adv: fix checkpatch string complaint
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

Regression introduced by: f76d019194e0a88c57371df169ecc979690a04c2

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_iv_ogm.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index bafb473..abd10c4 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -1061,8 +1061,8 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 
 	if (batman_ogm_packet->flags & NOT_BEST_NEXT_HOP) {
 		bat_dbg(DBG_BATMAN, bat_priv,
-			"Drop packet: ignoring all packets not forwarded from "
-			"the best next hop (sender: %pM)\n", ethhdr->h_source);
+			"Drop packet: ignoring all packets not forwarded from the best next hop (sender: %pM)\n",
+			ethhdr->h_source);
 		return;
 	}
 
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 15/15] batman-adv: add contributor name
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

translation_table.{c,h} have been heavily modified by another contributor and
for legal purposes it is better to include his name into the contributor list

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/translation-table.c |    2 +-
 net/batman-adv/translation-table.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index a38d315..2cb46f0 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -1,7 +1,7 @@
 /*
  * Copyright (C) 2007-2012 B.A.T.M.A.N. contributors:
  *
- * Marek Lindner, Simon Wunderlich
+ * Marek Lindner, Simon Wunderlich, Antonio Quartulli
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
diff --git a/net/batman-adv/translation-table.h b/net/batman-adv/translation-table.h
index bfebe26..593d1b3 100644
--- a/net/batman-adv/translation-table.h
+++ b/net/batman-adv/translation-table.h
@@ -1,7 +1,7 @@
 /*
  * Copyright (C) 2007-2012 B.A.T.M.A.N. contributors:
  *
- * Marek Lindner, Simon Wunderlich
+ * Marek Lindner, Simon Wunderlich, Antonio Quartulli
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 11/15] batman-adv: Adding hard_iface specific sysfs wrapper macros for UINT
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Linus Luessing, Marek Lindner,
	Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Linus Luessing <linus.luessing@web.de>

This allows us to easily add a sysfs parameter for an unsigned int
later, which is not for a batman mesh interface (e.g. bat0), but for a
common interface instead. It allows reading and writing an atomic_t in
hard_iface (instead of bat_priv compared to the mesh variant).

Developed by Linus during a 6 months trainee study period in Ascom
(Switzerland) AG.

Signed-off-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_sysfs.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/net/batman-adv/bat_sysfs.c b/net/batman-adv/bat_sysfs.c
index 913299d..5bc7b66 100644
--- a/net/batman-adv/bat_sysfs.c
+++ b/net/batman-adv/bat_sysfs.c
@@ -117,6 +117,49 @@ ssize_t show_##_name(struct kobject *kobj,				\
 	static BAT_ATTR(_name, _mode, show_##_name, store_##_name)
 
 
+#define BAT_ATTR_HIF_STORE_UINT(_name, _min, _max, _post_func)		\
+ssize_t store_##_name(struct kobject *kobj, struct attribute *attr,	\
+		      char *buff, size_t count)				\
+{									\
+	struct net_device *net_dev = kobj_to_netdev(kobj);		\
+	struct hard_iface *hard_iface = hardif_get_by_netdev(net_dev);	\
+	ssize_t length;							\
+									\
+	if (!hard_iface)						\
+		return 0;						\
+									\
+	length = __store_uint_attr(buff, count, _min, _max, _post_func,	\
+				   attr, &hard_iface->_name, net_dev);	\
+									\
+	hardif_free_ref(hard_iface);					\
+	return length;							\
+}
+
+#define BAT_ATTR_HIF_SHOW_UINT(_name)					\
+ssize_t show_##_name(struct kobject *kobj,				\
+		     struct attribute *attr, char *buff)		\
+{									\
+	struct net_device *net_dev = kobj_to_netdev(kobj);		\
+	struct hard_iface *hard_iface = hardif_get_by_netdev(net_dev);	\
+	ssize_t length;							\
+									\
+	if (!hard_iface)						\
+		return 0;						\
+									\
+	length = sprintf(buff, "%i\n", atomic_read(&hard_iface->_name));\
+									\
+	hardif_free_ref(hard_iface);					\
+	return length;							\
+}
+
+/* Use this, if you are going to set [name] in hard_iface to an
+ * unsigned integer value*/
+#define BAT_ATTR_HIF_UINT(_name, _mode, _min, _max, _post_func)		\
+	static BAT_ATTR_HIF_STORE_UINT(_name, _min, _max, _post_func)	\
+	static BAT_ATTR_HIF_SHOW_UINT(_name)				\
+	static BAT_ATTR(_name, _mode, show_##_name, store_##_name)
+
+
 static int store_bool_attr(char *buff, size_t count,
 			   struct net_device *net_dev,
 			   const char *attr_name, atomic_t *attr)
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 10/15] batman-adv: rename sysfs macros to reflect the soft-interface dependency
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_sysfs.c |   57 ++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 28 deletions(-)

diff --git a/net/batman-adv/bat_sysfs.c b/net/batman-adv/bat_sysfs.c
index 2c81688..913299d 100644
--- a/net/batman-adv/bat_sysfs.c
+++ b/net/batman-adv/bat_sysfs.c
@@ -63,7 +63,7 @@ struct bat_attribute bat_attr_##_name = {	\
 	.store  = _store,			\
 };
 
-#define BAT_ATTR_STORE_BOOL(_name, _post_func)				\
+#define BAT_ATTR_SIF_STORE_BOOL(_name, _post_func)			\
 ssize_t store_##_name(struct kobject *kobj, struct attribute *attr,	\
 		      char *buff, size_t count)				\
 {									\
@@ -73,9 +73,9 @@ ssize_t store_##_name(struct kobject *kobj, struct attribute *attr,	\
 				 &bat_priv->_name, net_dev);		\
 }
 
-#define BAT_ATTR_SHOW_BOOL(_name)					\
-ssize_t show_##_name(struct kobject *kobj, struct attribute *attr,	\
-			    char *buff)					\
+#define BAT_ATTR_SIF_SHOW_BOOL(_name)					\
+ssize_t show_##_name(struct kobject *kobj,				\
+		     struct attribute *attr, char *buff)		\
 {									\
 	struct bat_priv *bat_priv = kobj_to_batpriv(kobj);		\
 	return sprintf(buff, "%s\n",					\
@@ -83,16 +83,17 @@ ssize_t show_##_name(struct kobject *kobj, struct attribute *attr,	\
 		       "disabled" : "enabled");				\
 }									\
 
-/* Use this, if you are going to turn a [name] in bat_priv on or off */
-#define BAT_ATTR_BOOL(_name, _mode, _post_func)				\
-	static BAT_ATTR_STORE_BOOL(_name, _post_func)			\
-	static BAT_ATTR_SHOW_BOOL(_name)				\
+/* Use this, if you are going to turn a [name] in the soft-interface
+ * (bat_priv) on or off */
+#define BAT_ATTR_SIF_BOOL(_name, _mode, _post_func)			\
+	static BAT_ATTR_SIF_STORE_BOOL(_name, _post_func)		\
+	static BAT_ATTR_SIF_SHOW_BOOL(_name)				\
 	static BAT_ATTR(_name, _mode, show_##_name, store_##_name)
 
 
-#define BAT_ATTR_STORE_UINT(_name, _min, _max, _post_func)		\
+#define BAT_ATTR_SIF_STORE_UINT(_name, _min, _max, _post_func)		\
 ssize_t store_##_name(struct kobject *kobj, struct attribute *attr,	\
-			     char *buff, size_t count)			\
+		      char *buff, size_t count)				\
 {									\
 	struct net_device *net_dev = kobj_to_netdev(kobj);		\
 	struct bat_priv *bat_priv = netdev_priv(net_dev);		\
@@ -100,19 +101,19 @@ ssize_t store_##_name(struct kobject *kobj, struct attribute *attr,	\
 				 attr, &bat_priv->_name, net_dev);	\
 }
 
-#define BAT_ATTR_SHOW_UINT(_name)					\
-ssize_t show_##_name(struct kobject *kobj, struct attribute *attr,	\
-			    char *buff)					\
+#define BAT_ATTR_SIF_SHOW_UINT(_name)					\
+ssize_t show_##_name(struct kobject *kobj,				\
+		     struct attribute *attr, char *buff)		\
 {									\
 	struct bat_priv *bat_priv = kobj_to_batpriv(kobj);		\
 	return sprintf(buff, "%i\n", atomic_read(&bat_priv->_name));	\
 }									\
 
-/* Use this, if you are going to set [name] in bat_priv to unsigned integer
- * values only */
-#define BAT_ATTR_UINT(_name, _mode, _min, _max, _post_func)		\
-	static BAT_ATTR_STORE_UINT(_name, _min, _max, _post_func)	\
-	static BAT_ATTR_SHOW_UINT(_name)				\
+/* Use this, if you are going to set [name] in the soft-interface
+ * (bat_priv) to an unsigned integer value */
+#define BAT_ATTR_SIF_UINT(_name, _mode, _min, _max, _post_func)		\
+	static BAT_ATTR_SIF_STORE_UINT(_name, _min, _max, _post_func)	\
+	static BAT_ATTR_SIF_SHOW_UINT(_name)				\
 	static BAT_ATTR(_name, _mode, show_##_name, store_##_name)
 
 
@@ -384,24 +385,24 @@ static ssize_t store_gw_bwidth(struct kobject *kobj, struct attribute *attr,
 	return gw_bandwidth_set(net_dev, buff, count);
 }
 
-BAT_ATTR_BOOL(aggregated_ogms, S_IRUGO | S_IWUSR, NULL);
-BAT_ATTR_BOOL(bonding, S_IRUGO | S_IWUSR, NULL);
+BAT_ATTR_SIF_BOOL(aggregated_ogms, S_IRUGO | S_IWUSR, NULL);
+BAT_ATTR_SIF_BOOL(bonding, S_IRUGO | S_IWUSR, NULL);
 #ifdef CONFIG_BATMAN_ADV_BLA
-BAT_ATTR_BOOL(bridge_loop_avoidance, S_IRUGO | S_IWUSR, NULL);
+BAT_ATTR_SIF_BOOL(bridge_loop_avoidance, S_IRUGO | S_IWUSR, NULL);
 #endif
-BAT_ATTR_BOOL(fragmentation, S_IRUGO | S_IWUSR, update_min_mtu);
-BAT_ATTR_BOOL(ap_isolation, S_IRUGO | S_IWUSR, NULL);
+BAT_ATTR_SIF_BOOL(fragmentation, S_IRUGO | S_IWUSR, update_min_mtu);
+BAT_ATTR_SIF_BOOL(ap_isolation, S_IRUGO | S_IWUSR, NULL);
 static BAT_ATTR(vis_mode, S_IRUGO | S_IWUSR, show_vis_mode, store_vis_mode);
 static BAT_ATTR(routing_algo, S_IRUGO, show_bat_algo, NULL);
 static BAT_ATTR(gw_mode, S_IRUGO | S_IWUSR, show_gw_mode, store_gw_mode);
-BAT_ATTR_UINT(orig_interval, S_IRUGO | S_IWUSR, 2 * JITTER, INT_MAX, NULL);
-BAT_ATTR_UINT(hop_penalty, S_IRUGO | S_IWUSR, 0, TQ_MAX_VALUE, NULL);
-BAT_ATTR_UINT(gw_sel_class, S_IRUGO | S_IWUSR, 1, TQ_MAX_VALUE,
-	      post_gw_deselect);
+BAT_ATTR_SIF_UINT(orig_interval, S_IRUGO | S_IWUSR, 2 * JITTER, INT_MAX, NULL);
+BAT_ATTR_SIF_UINT(hop_penalty, S_IRUGO | S_IWUSR, 0, TQ_MAX_VALUE, NULL);
+BAT_ATTR_SIF_UINT(gw_sel_class, S_IRUGO | S_IWUSR, 1, TQ_MAX_VALUE,
+		  post_gw_deselect);
 static BAT_ATTR(gw_bandwidth, S_IRUGO | S_IWUSR, show_gw_bwidth,
 		store_gw_bwidth);
 #ifdef CONFIG_BATMAN_ADV_DEBUG
-BAT_ATTR_UINT(log_level, S_IRUGO | S_IWUSR, 0, 15, NULL);
+BAT_ATTR_SIF_UINT(log_level, S_IRUGO | S_IWUSR, 0, 15, NULL);
 #endif
 
 static struct bat_attribute *mesh_attrs[] = {
-- 
1.7.9.4

^ permalink raw reply related

* Re: Information leakage from RDS protocol
From: Venkat Venkatsubra @ 2012-05-11 12:57 UTC (permalink / raw)
  To: Jay Fenlason
  Cc: Linus Torvalds, security, eugene, pmatouse, Netdev, David Miller,
	rds-devel
In-Reply-To: <4FABE0F8.6070806@oracle.com>

On 5/10/2012 10:38 AM, Venkat Venkatsubra wrote:
> On 5/9/2012 10:57 AM, Jay Fenlason wrote:
>> On Wed, May 09, 2012 at 10:17:57AM -0500, Venkat Venkatsubra wrote:
>>> On 5/8/2012 1:22 PM, Jay Fenlason wrote:
>>>>> On Tue, May 8, 2012 at 9:10 AM, Jay 
>>>>> Fenlason<fenlason@redhat.com>   wrote:
>>>>>> recvfrom() on an RDS socket can return the contents of random(?)
>>>>>> kernel memory to userspace if it was called with a address
>>>>>> length larger than sizeof(struct sockaddr_in). ?rds_recvmsg() also
>>>>>> fails to set the addr_len paramater properly before returning, but
>>>>>> that's just a bug.
>>>>>>
>>>>>> There are also a number of cases wher recvfrom() can return an 
>>>>>> entirely
>>>>>> bogus address. ?Anything in rds_recvmsg() that returns a
>>>>>> non-negative value but does not go through the
>>>>>> ? "sin = (struct sockaddr_in *)msg->msg_name;"
>>>>>> code path at the end of the while(1) loop will return up to 128
>>>>>> bytes of kernel memory to userspace.
>>>>>>
>>>>>> Also, on a receive race, the message that was copied to userspace 
>>>>>> but
>>>>>> received by someone else is not zeroed, meaning that if the next
>>>>>> message it receives is smaller, the tail of the raced message is
>>>>>> leaked. ?I'm not sure how serious this is, but unexpectedly 
>>>>>> scribbling
>>>>>> on userspace memory (even if it is part of a buffer that userspace
>>>>>> asked us to write to) should be avoided.
>>>>>>
>>>> On Tue, May 08, 2012 at 11:04:01AM -0700, Linus Torvalds wrote:
>>>>> Please cc David Miller too on these things, and make sure he knows
>>>>> there's no embargo or anything (he won't touch it if there is). Maybe
>>>>> you don't want public mailing lists, but in general, the more open we
>>>>> can be, the better.
>>>> Added.  Nobody has said anything about any embargo to me, either
>>>> that they want one or that there shouldn't be one.  Personally, I
>>>> don't see any reason to embargo this, but I'm not on any
>>>> security-response teams.
>>>>
>>>>> This seems unfortunate, but at least the address thing is limited to
>>>>> sizeof(sockaddr_storage) and is kernel stack - which in turn means
>>>>> that while it potentially leaks kernel addresses (bad!), it almost
>>>>> certainly won't leak anything fundamentally interesting (ie you can't
>>>>> read arbitrary kernel memory and find plaintext passwords etc).
>>>>>
>>>>> I assume the fix is a trivial
>>>>>
>>>>>    msg->msg_namelen = sizeof(*sin);
>>>>>
>>>>> in rds_recvmsg() where it sets up the address?
>>>> That fixes the case where it actually sets up the address, but won't
>>>> fix the cases where it doesn't even do that.  I don't think anyone
>>>> ever thought about what the source address should be for a message
>>>> that was generated internally by the kernel.  I think the obvious
>>>> possibilities are msg_namelen = 0 (no address) and 127.0.0.1
>>>>
>>>>> I do wonder if maybe recvmsg() should initialize msg_namelen to 0
>>>>> instead of the size of the buffer before calling the low-level 
>>>>> recvmsg
>>>>> function - so that protocols would have to explicitly set the size to
>>>>> the right value. But that would need much more validation.
>>>> That would require checking/fixing all of the low-level functions,
>>>> which will then have to know that the buffer pointed to by msg is at
>>>> most sizeof(struct sockaddr_storage) bytes.  I think it's better to
>>>> keep the size of the address buffer there, so the low-level functions
>>>> can confirm that the address data they're about to stuff in there
>>>> won't overflow the buffer.  (That way if we ever change the size of
>>>> the buffer, only one place has to change.)
>>>>
>>>> And the whole rds recieve subsystem needs a bit of a rewrite to close
>>>> the information-leaking receive race.  Keeping the semantics correct
>>>> in regards to MSG_PEEK and multiple threads reading the socket at the
>>>> same time may be tricky.
>>>>
>>>>         -- JF
>>>>
>>> How about adding the suggested "msg->msg_namelen = sizeof(*sin);"
>>> line at the top of rds_recvmsg ?
>>> And "msg->msg_namelen = 0;" in the below "break;" cases ? I am
>>> assuming the apps wouldn't need to look at msg_name in these cases.
>>>                  if (!list_empty(&rs->rs_notify_queue)) {
>>>                          ret = rds_notify_queue_get(rs, msg);
>>>                          break;
>>>                  }
>>>
>>>                  if (rs->rs_cong_notify) {
>>>                          ret = rds_notify_cong(rs, msg);
>>>                          break;
>>>                  }
>> Wouldn't it be better to set msg->msg_namelen = 0 at the top of the
>> function, and only set it to sizeof(*sin) after msg->msg_name is
>> filled in?  That'll prevent accidental disclosure of kernel memory via
>> unanticipated code paths.
>>
>>> And, shouldn't an error be returned for the case below ? Currently
>>> zero is returned.
>>>
>>>          if (msg_flags&  MSG_OOB)
>>>                  goto out;
>>> An error such as EOPNOTSUPP ?
>> I don't know.  I'm not a networking expert.  From what I've found
>> googling, EINVAL would be more correct that ENOTSUPP.
>>
>> This only leaves the datagram contents leak to userspace when multiple
>> threads race on receiving a datagram and the subsequent datagram is
>> smaller.  That one will be hard to fix, most notably because the
>> obvious fixes I've looked at involve losing a datagram if either of
>>    inc->i_conn->c_trans->inc_copy_to_user()
>> or
>>    rds_cmsg_recv()
>> fail.  I don't know how likely either of those are, but losing
>> datagrams seems like an inappropriate behavior for a reliable datagram
>> subsystem.
>>
> Moving the discussion to netdev.
>
> Venkat
Forgot to include rds-devel.

Venkat

^ permalink raw reply

* Re: [PATCH v5] tilegx network driver: initial support
From: Ben Hutchings @ 2012-05-11 13:54 UTC (permalink / raw)
  To: Chris Metcalf; +Cc: David Miller, arnd, linux-kernel, netdev
In-Reply-To: <201205091452.q49EqcO7005975@farm-0027.internal.tilera.com>

Here's another very incomplete review for you.

On Wed, 2012-05-09 at 06:42 -0400, Chris Metcalf wrote:
> This change adds support for the tilegx network driver based on the
> GXIO IORPC support in the tilegx software stack, using the on-chip
> mPIPE packet processing engine.
[...]
> --- /dev/null
> +++ b/drivers/net/ethernet/tile/tilegx.c
[...]
> +/* Define to support GSO. */
> +#undef TILE_NET_GSO

GSO is always enabled by the networking core.

> +/* Define to support TSO. */
> +#define TILE_NET_TSO

No, put NETIF_F_TSO in hw_features so it can be switched at run-time.

(Currently that won't work if you don't set dev->ethtool_ops, but that's
a bug that can be fixed.)

> +/* Use 3000 to enable the Linux Traffic Control (QoS) layer, else 0. */
> +#define TILE_NET_TX_QUEUE_LEN 0

This can be changed through sysfs, so there is no need for a compile-
time option.

> +/* Define to dump packets (prints out the whole packet on tx and rx). */
> +#undef TILE_NET_DUMP_PACKETS

Should really be controlled through a 'debug' module parameter (see
netif_msg_init(), netif_msg_pktdata(), etc.)

[...]
> +/* Total header bytes per equeue slot.  Must be big enough for 2 bytes
> + * of NET_IP_ALIGN alignment, plus 14 bytes (?) of L2 header, plus up to
> + * 60 bytes of actual TCP header.  We round up to align to cache lines.
> + */
> +#define HEADER_BYTES 128
> +
> +/* Maximum completions per cpu per device (must be a power of two).
> + * ISSUE: What is the right number here?
> + */
> +#define TILE_NET_MAX_COMPS 64
> +
> +#define MAX_FRAGS (65536 / PAGE_SIZE + 2 + 1)

Should be MAX_SKB_FRAGS + 1.

[...]
> +/* Help the kernel transmit a packet. */
> +static int tile_net_tx(struct sk_buff *skb, struct net_device *dev)
> +{
> +	struct tile_net_priv *priv = netdev_priv(dev);
> +
> +	struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
> +
> +	struct tile_net_egress *egress = &egress_for_echannel[priv->echannel];
> +	gxio_mpipe_equeue_t *equeue = egress->equeue;
> +
> +	struct tile_net_comps *comps =
> +		info->comps_for_echannel[priv->echannel];
> +
> +	struct skb_shared_info *sh = skb_shinfo(skb);
> +
> +	unsigned int len = skb->len;
> +	unsigned char *data = skb->data;
> +
> +	unsigned int num_frags;
> +	struct frag frags[MAX_FRAGS];
> +	gxio_mpipe_edesc_t edescs[MAX_FRAGS];
> +
> +	unsigned int i;
> +
> +	int cid;
> +
> +	s64 slot;
> +
> +	unsigned long irqflags;

Please, no blank lines between your declarations.

[...]
> +	/* Reserve slots, or return NETDEV_TX_BUSY if "full". */
> +	slot = gxio_mpipe_equeue_try_reserve(equeue, num_frags);
> +	if (slot < 0) {
> +		local_irq_restore(irqflags);
> +		/* ISSUE: "Virtual device xxx asks to queue packet". */
> +		return NETDEV_TX_BUSY;
> +	}

You're supposed to stop queues when they're full.  And since that state
appears to be per-CPU, I think this device needs to be multiqueue with
one TX queue per CPU and ndo_select_queue defined accordingly.

> +	for (i = 0; i < num_frags; i++)
> +		gxio_mpipe_equeue_put_at(equeue, edescs[i], slot + i);
> +
> +	/* Wait for a free completion entry.
> +	 * ISSUE: Is this the best logic?
> +	 * ISSUE: Can this cause undesirable "blocking"?
> +	 */
> +	while (comps->comp_next - comps->comp_last >= TILE_NET_MAX_COMPS - 1)
> +		tile_net_free_comps(equeue, comps, 32, false);

I'm not convinced you should be processing completions here at all.  But
certainly you should have stopped the queue earlier rather than having
to wait here.

> +	/* Update the completions array. */
> +	cid = comps->comp_next % TILE_NET_MAX_COMPS;
> +	comps->comp_queue[cid].when = slot + num_frags;
> +	comps->comp_queue[cid].skb = skb;
> +	comps->comp_next++;
> +
> +	/* HACK: Track "expanded" size for short packets (e.g. 42 < 60). */
> +	atomic_add(1, (atomic_t *)&priv->stats.tx_packets);
> +	atomic_add((len >= ETH_ZLEN) ? len : ETH_ZLEN,
> +		   (atomic_t *)&priv->stats.tx_bytes);

You mustn't treat random fields to atomic_t.  For one thing, atomic_t
contains an int while stats are unsigned long...

Also, you're adding cache contention between all your CPUs here.  You
should maintain these stats per-CPU and then sum them in
tile_net_get_stats().  Then you can just use ordinary additions.

[...]
> +/* Ioctl commands. */
> +static int tile_net_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
> +{
> +	return -EOPNOTSUPP;
> +}

So why define it at all?

[...]
> +static void tile_net_dev_init(const char *name, const uint8_t *mac)
> +{
[...]
> +	/* Register the network device. */
> +	ret = register_netdev(dev);
> +	if (ret) {
> +		netdev_err(dev, "register_netdev failed %d\n", ret);
> +		free_netdev(dev);
> +		return;
> +	}
> +
> +	/* Get the MAC address and set it in the device struct; this must
> +	 * be done before the device is opened.
[...]

So you had better do this before calling register_netdev(), as the
device can be opened immediately after that...

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH v3][resend] bonding: don't increase rx_dropped after processing LACPDUs
From: Jiri Bohac @ 2012-05-11 13:54 UTC (permalink / raw)
  To: David Miller; +Cc: jbohac, andy, netdev, fubar
In-Reply-To: <20120510.233134.1390996845692589691.davem@davemloft.net>

On Thu, May 10, 2012 at 11:31:34PM -0400, David Miller wrote:
> > Since commit 3aba891d, bonding processes LACP frames (802.3ad
> > mode) with bond_handle_frame(). Currently a copy of the skb is
> > made and the original is left to be processed by other
> > rx_handlers and the rest of the network stack by returning
> > RX_HANDLER_ANOTHER.  As there is no protocol handler for
> > PKT_TYPE_LACPDU, the frame is dropped and dev->rx_dropped
> > increased.
> > 
> > Fix this by making bond_handle_frame() return RX_HANDLER_CONSUMED
> > if bonding has processed the LACP frame.
> > 
> > Signed-off-by: Jiri Bohac <jbohac@suse.cz>
> > 
> 
> Don't send me garbage you didn't even check the compile of:
> 
> drivers/net/bonding/bond_main.c: In function ‘bond_handle_frame’:
> drivers/net/bonding/bond_main.c:1463:13: warning: assignment from incompatible pointer type [enabled by default]
> drivers/net/bonding/bond_main.c: In function ‘bond_open’:
> drivers/net/bonding/bond_main.c:3441:21: warning: assignment from incompatible pointer type [enabled by default]
> drivers/net/bonding/bond_main.c:3448:20: warning: assignment from incompatible pointer type [enabled by default]

sorry, I overlooked these warnings. The patch actually broke the
rlb mode which I did not test. 

I'll send a fixed patch (v4).

Thanks!

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ

^ permalink raw reply

* [PATCH v4] bonding: don't increase rx_dropped after processing LACPDUs
From: Jiri Bohac @ 2012-05-11 13:59 UTC (permalink / raw)
  To: David Miller; +Cc: andy, netdev, fubar

Since commit 3aba891d, bonding processes LACP frames (802.3ad
mode) with bond_handle_frame(). Currently a copy of the skb is
made and the original is left to be processed by other
rx_handlers and the rest of the network stack by returning
RX_HANDLER_ANOTHER.  As there is no protocol handler for
PKT_TYPE_LACPDU, the frame is dropped and dev->rx_dropped
increased.

Fix this by making bond_handle_frame() return RX_HANDLER_CONSUMED
if bonding has processed the LACP frame.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2173,9 +2173,10 @@ re_arm:
  * received frames (loopback). Since only the payload is given to this
  * function, it check for loopback.
  */
-static void bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u16 length)
+static int bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u16 length)
 {
 	struct port *port;
+	int ret = RX_HANDLER_ANOTHER;
 
 	if (length >= sizeof(struct lacpdu)) {
 
@@ -2184,11 +2185,12 @@ static void bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u
 		if (!port->slave) {
 			pr_warning("%s: Warning: port of slave %s is uninitialized\n",
 				   slave->dev->name, slave->dev->master->name);
-			return;
+			return ret;
 		}
 
 		switch (lacpdu->subtype) {
 		case AD_TYPE_LACPDU:
+			ret = RX_HANDLER_CONSUMED;
 			pr_debug("Received LACPDU on port %d\n",
 				 port->actor_port_number);
 			/* Protect against concurrent state machines */
@@ -2198,6 +2200,7 @@ static void bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u
 			break;
 
 		case AD_TYPE_MARKER:
+			ret = RX_HANDLER_CONSUMED;
 			// No need to convert fields to Little Endian since we don't use the marker's fields.
 
 			switch (((struct bond_marker *)lacpdu)->tlv_type) {
@@ -2219,6 +2222,7 @@ static void bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u
 			}
 		}
 	}
+	return ret;
 }
 
 /**
@@ -2456,18 +2460,20 @@ out:
 	return NETDEV_TX_OK;
 }
 
-void bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
+int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
 			  struct slave *slave)
 {
+	int ret = RX_HANDLER_ANOTHER;
 	if (skb->protocol != PKT_TYPE_LACPDU)
-		return;
+		return ret;
 
 	if (!pskb_may_pull(skb, sizeof(struct lacpdu)))
-		return;
+		return ret;
 
 	read_lock(&bond->lock);
-	bond_3ad_rx_indication((struct lacpdu *) skb->data, slave, skb->len);
+	ret = bond_3ad_rx_indication((struct lacpdu *) skb->data, slave, skb->len);
 	read_unlock(&bond->lock);
+	return ret;
 }
 
 /*
diff --git a/drivers/net/bonding/bond_3ad.h b/drivers/net/bonding/bond_3ad.h
index 235b2cc..5ee7e3c 100644
--- a/drivers/net/bonding/bond_3ad.h
+++ b/drivers/net/bonding/bond_3ad.h
@@ -274,7 +274,7 @@ void bond_3ad_adapter_duplex_changed(struct slave *slave);
 void bond_3ad_handle_link_change(struct slave *slave, char link);
 int  bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info);
 int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev);
-void bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
+int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
 			  struct slave *slave);
 int bond_3ad_set_carrier(struct bonding *bond);
 void bond_3ad_update_lacp_rate(struct bonding *bond);
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 9abfde4..2e1f806 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -342,26 +342,26 @@ static void rlb_update_entry_from_arp(struct bonding *bond, struct arp_pkt *arp)
 	_unlock_rx_hashtbl_bh(bond);
 }
 
-static void rlb_arp_recv(struct sk_buff *skb, struct bonding *bond,
+static int rlb_arp_recv(struct sk_buff *skb, struct bonding *bond,
 			 struct slave *slave)
 {
 	struct arp_pkt *arp;
 
 	if (skb->protocol != cpu_to_be16(ETH_P_ARP))
-		return;
+		goto out;
 
 	arp = (struct arp_pkt *) skb->data;
 	if (!arp) {
 		pr_debug("Packet has no ARP data\n");
-		return;
+		goto out;
 	}
 
 	if (!pskb_may_pull(skb, arp_hdr_len(bond->dev)))
-		return;
+		goto out;
 
 	if (skb->len < sizeof(struct arp_pkt)) {
 		pr_debug("Packet is too small to be an ARP\n");
-		return;
+		goto out;
 	}
 
 	if (arp->op_code == htons(ARPOP_REPLY)) {
@@ -369,6 +369,8 @@ static void rlb_arp_recv(struct sk_buff *skb, struct bonding *bond,
 		rlb_update_entry_from_arp(bond, arp);
 		pr_debug("Server received an ARP Reply from client\n");
 	}
+out:
+	return RX_HANDLER_ANOTHER;
 }
 
 /* Caller must hold bond lock for read */
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 62d2409..bc13b3d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1444,8 +1444,9 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 	struct sk_buff *skb = *pskb;
 	struct slave *slave;
 	struct bonding *bond;
-	void (*recv_probe)(struct sk_buff *, struct bonding *,
+	int (*recv_probe)(struct sk_buff *, struct bonding *,
 				struct slave *);
+	int ret = RX_HANDLER_ANOTHER;
 
 	skb = skb_share_check(skb, GFP_ATOMIC);
 	if (unlikely(!skb))
@@ -1464,8 +1465,12 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
 
 		if (likely(nskb)) {
-			recv_probe(nskb, bond, slave);
+			ret = recv_probe(nskb, bond, slave);
 			dev_kfree_skb(nskb);
+			if (ret == RX_HANDLER_CONSUMED) {
+				consume_skb(skb);
+				return ret;
+			}
 		}
 	}
 
@@ -1487,7 +1492,7 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 		memcpy(eth_hdr(skb)->h_dest, bond->dev->dev_addr, ETH_ALEN);
 	}
 
-	return RX_HANDLER_ANOTHER;
+	return ret;
 }
 
 /* enslave device <slave> to bond device <master> */
@@ -2723,7 +2728,7 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
 	}
 }
 
-static void bond_arp_rcv(struct sk_buff *skb, struct bonding *bond,
+static int bond_arp_rcv(struct sk_buff *skb, struct bonding *bond,
 			 struct slave *slave)
 {
 	struct arphdr *arp;
@@ -2731,7 +2736,7 @@ static void bond_arp_rcv(struct sk_buff *skb, struct bonding *bond,
 	__be32 sip, tip;
 
 	if (skb->protocol != __cpu_to_be16(ETH_P_ARP))
-		return;
+		return RX_HANDLER_ANOTHER;
 
 	read_lock(&bond->lock);
 
@@ -2776,6 +2781,7 @@ static void bond_arp_rcv(struct sk_buff *skb, struct bonding *bond,
 
 out_unlock:
 	read_unlock(&bond->lock);
+	return RX_HANDLER_ANOTHER;
 }
 
 /*
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 9f2bae66..4581aa5 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -218,7 +218,7 @@ struct bonding {
 	struct   slave *primary_slave;
 	bool     force_primary;
 	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
-	void     (*recv_probe)(struct sk_buff *, struct bonding *,
+	int     (*recv_probe)(struct sk_buff *, struct bonding *,
 			       struct slave *);
 	rwlock_t lock;
 	rwlock_t curr_slave_lock;


-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ

^ permalink raw reply related

* [PATCH net-next] fq_codel: Fair Queue Codel AQM
From: Eric Dumazet @ 2012-05-11 13:59 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Dave Taht, Kathleen Nichols, Van Jacobson, Tom Herbert,
	Matt Mathis, Yuchung Cheng, Stephen Hemminger

From: Eric Dumazet <edumazet@google.com>

Fair Queue Codel implementation.

Principles :

- Packets are classified (internal classifier or external) on flows.
- This is a Stochastic model (as we use a hash, several flows might
                              be hashed on same slot)
- Each flow has a CoDel managed queue.
- Flows are linked onto two (Round Robin) lists,
  so that new flows have priority on old ones.

- For a given flow, packets are not reordered (CoDel uses a FIFO)
- head drops only.
- ECN capability is on by default.
- Very low memory footprint (64 bytes per flow)

tc qdisc ... fq_codel [ limit PACKETS ] [ flows number ]
                      [ target TIME ] [ interval TIME ] [ noecn ]

defaults : 1024 flows, 10240 packets limit

Impressive results on load :

# tc -s -d class show dev eth9

class htb 1:1 root leaf 10: prio 0 quantum 1514 rate 200000Kbit ceil 200000Kbit burst 1475b/8 mpu 0b overhead 0b cburst 1475b/8 mpu 0b overhead 0b level 0 
 Sent 1267974946 bytes 837585 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 202298Kbit 16702pps backlog 0b 103p requeues 0 
 lended: 837482 borrowed: 0 giants: 0
 tokens: -912 ctokens: -912

class fq_codel 10:a7 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 18168b 12p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 7.0ms
class fq_codel 10:10b parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 16654b 11p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 6.4ms
class fq_codel 10:146 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 13626b 9p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 5.2ms
class fq_codel 10:1c0 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 12112b 8p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 2.8ms
class fq_codel 10:2ba parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 13626b 9p requeues 0 
  deficit 926 count 1 lastcount 1 ldelay 5.2ms
class fq_codel 10:31d parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 16654b 11p requeues 0 
  deficit 0 count 1 lastcount 1 ldelay 6.4ms
class fq_codel 10:32c parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 15140b 10p requeues 0 
  deficit 80 count 1 lastcount 1 ldelay 6.4ms
class fq_codel 10:342 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 16654b 11p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 6.4ms
class fq_codel 10:3ab parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 18168b 12p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 7.0ms
class fq_codel 10:3c2 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 15140b 10p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 6.4ms

# tc -s -d qdisc show dev eth9

qdisc htb 1: root refcnt 6 r2q 10 default 1 direct_packets_stat 0 ver 3.17
 Sent 1267878050 bytes 837521 pkt (dropped 0, overlimits 1666567 requeues 1) 
 rate 202305Kbit 16703pps backlog 0b 104p requeues 1 
qdisc fq_codel 10: parent 1:1 limit 10240p target 5.0ms interval 100.0ms ecn 
 Sent 1267878050 bytes 837521 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 202305Kbit 16703pps backlog 157456b 104p requeues 0 
  maxpacket 1514 drop_overlimit 0 new_flow_count 87 ecn_mark 4071
  new_flows_len 0 old_flows_len 10

# ping -c 10 172.30.42.18
PING 172.30.42.18 (172.30.42.18) 56(84) bytes of data.
64 bytes from 172.30.42.18: icmp_req=1 ttl=64 time=0.227 ms
64 bytes from 172.30.42.18: icmp_req=2 ttl=64 time=0.165 ms
64 bytes from 172.30.42.18: icmp_req=3 ttl=64 time=0.166 ms
64 bytes from 172.30.42.18: icmp_req=4 ttl=64 time=0.151 ms
64 bytes from 172.30.42.18: icmp_req=5 ttl=64 time=0.164 ms
64 bytes from 172.30.42.18: icmp_req=6 ttl=64 time=0.172 ms
64 bytes from 172.30.42.18: icmp_req=7 ttl=64 time=0.175 ms
64 bytes from 172.30.42.18: icmp_req=8 ttl=64 time=0.183 ms
64 bytes from 172.30.42.18: icmp_req=9 ttl=64 time=0.158 ms
64 bytes from 172.30.42.18: icmp_req=10 ttl=64 time=0.200 ms

--- 172.30.42.18 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 8999ms
rtt min/avg/max/mdev = 0.151/0.176/0.227/0.022 ms

Much better than SFQ because of priority given to new flows

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
---
 include/linux/pkt_sched.h |   45 ++
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_fq_codel.c  |  595 ++++++++++++++++++++++++++++++++++++
 4 files changed, 652 insertions(+)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index cde56c2..3ffdaec 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -681,4 +681,49 @@ struct tc_codel_xstats {
 	__u32	dropping;  /* are we in dropping state ? */
 };
 
+/* FQ_CODEL */
+
+enum {
+	TCA_FQ_CODEL_UNSPEC,
+	TCA_FQ_CODEL_TARGET,
+	TCA_FQ_CODEL_LIMIT,
+	TCA_FQ_CODEL_INTERVAL,
+	TCA_FQ_CODEL_ECN,
+	TCA_FQ_CODEL_FLOWS,
+	__TCA_FQ_CODEL_MAX
+};
+
+#define TCA_FQ_CODEL_MAX	(__TCA_FQ_CODEL_MAX - 1)
+
+enum {
+	TCA_FQ_CODEL_XSTATS_QDISC,
+	TCA_FQ_CODEL_XSTATS_CLASS,
+};
+
+struct tc_fq_codel_xstats {
+	__u32	type;
+	union {
+		struct {
+			__u32	maxpacket; /* largest packet we've seen so far */
+			__u32	drop_overlimit; /* number of time max qdisc packet limit was hit */
+			__u32	ecn_mark;  /* number of packets we ECN marked
+					    * instead of dropped
+					    */
+			__u32	new_flow_count; /* number of time packets created a 'new flow' */
+			__u32	new_flows_len;	/* count of flows in new list */
+			__u32	old_flows_len;	/* count of flows in old list */ 
+		} qdisc_stats;
+		struct {
+			__s32	deficit;
+			__u32	ldelay; /* in-queue delay seen by most recently
+					 * dequeued packet
+					 */
+			__u32	count;
+			__u32	lastcount;
+			__u32	dropping;
+			__s32	drop_next;
+		} class_stats;
+	};
+};
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index fadd252..e7a8976 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -261,6 +261,17 @@ config NET_SCH_CODEL
 
 	  If unsure, say N.
 
+config NET_SCH_FQ_CODEL
+	tristate "Fair Queue Controlled Delay AQM (FQ_CODEL)"
+	help
+	  Say Y here if you want to use the FQ Controlled Delay (FQ_CODEL)
+	  packet scheduling algorithm.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called sch_fq_codel.
+
+	  If unsure, say N.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 30fab03..5940a19 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_NET_SCH_MQPRIO)	+= sch_mqprio.o
 obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_SCH_QFQ)	+= sch_qfq.o
 obj-$(CONFIG_NET_SCH_CODEL)	+= sch_codel.o
+obj-$(CONFIG_NET_SCH_FQ_CODEL)	+= sch_fq_codel.o
 
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
new file mode 100644
index 0000000..8675ff8
--- /dev/null
+++ b/net/sched/sch_fq_codel.c
@@ -0,0 +1,595 @@
+/*
+ * Fair Queue CoDel discipline
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ *
+ *  Copyright (C) 2012 Eric Dumazet <edumazet@google.com>
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/jiffies.h>
+#include <linux/string.h>
+#include <linux/in.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/skbuff.h>
+#include <linux/jhash.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/flow_keys.h>
+#include <net/codel.h>
+
+/*	Fair Queue CoDel.
+ *
+ * Principles :
+ * Packets are classified (internal classifier or external) on flows.
+ * This is a Stochastic model (as we use a hash, several flows
+ *			       might be hashed on same slot)
+ * Each flow has a CoDel managed queue.
+ * Flows are linked onto two (Round Robin) lists,
+ * so that new flows have priority on old ones.
+ *
+ * For a given flow, packets are not reordered (CoDel uses a FIFO)
+ * head drops only.
+ * ECN capability is on by default.
+ * Low memory footprint (64 bytes per flow)
+ */
+
+struct fq_codel_flow {
+	struct sk_buff	  *head;
+	struct sk_buff	  *tail;
+	struct list_head  flowchain;
+	int		  deficit;
+	struct codel_vars cvars;
+};
+
+struct fq_codel_sched_data {
+	struct tcf_proto *filter_list;	/* external classifier */
+	struct fq_codel_flow *flows;	/* Flows table [flows_cnt] */
+	u32		*backlogs;	/* backlog table [flows_cnt] */
+	u32		flows_cnt;	/* number of flows */
+	u32		perturbation;	/* hash perturbation */
+	u32		quantum;	/* psched_mtu(qdisc_dev(sch)); */
+	struct codel_params cparams;
+	struct codel_stats cstats;
+	u32		drop_overlimit;
+	u32		new_flow_count;
+
+	struct list_head new_flows;	/* list of new flows */
+	struct list_head old_flows;	/* list of old flows */
+};
+
+static unsigned int fq_codel_hash(const struct fq_codel_sched_data *q,
+				  const struct sk_buff *skb)
+{
+	struct flow_keys keys;
+	unsigned int hash;
+
+	skb_flow_dissect(skb, &keys);
+	hash = jhash_3words((__force u32)keys.dst,
+			    (__force u32)keys.src ^ keys.ip_proto,
+			    (__force u32)keys.ports, q->perturbation);
+	return ((u64)hash * q->flows_cnt) >> 32;
+}
+
+static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
+				      int *qerr)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct tcf_result res;
+	int result;
+
+	if (TC_H_MAJ(skb->priority) == sch->handle &&
+	    TC_H_MIN(skb->priority) > 0 &&
+	    TC_H_MIN(skb->priority) <= q->flows_cnt)
+		return TC_H_MIN(skb->priority);
+
+	if (!q->filter_list)
+		return fq_codel_hash(q, skb) + 1;
+
+	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return 0;
+		}
+#endif
+		if (TC_H_MIN(res.classid) <= q->flows_cnt)
+			return TC_H_MIN(res.classid);
+	}
+	return 0;
+}
+
+/* helper functions : might be changed when/if skb use a standard list_head */
+
+/* remove one skb from head of slot queue */
+static inline struct sk_buff *dequeue_head(struct fq_codel_flow *flow)
+{
+	struct sk_buff *skb = flow->head;
+
+	flow->head = skb->next;
+	skb->next = NULL;
+	return skb;
+}
+
+/* add skb to flow queue (tail add) */
+static inline void flow_queue_add(struct fq_codel_flow *flow,
+				  struct sk_buff *skb)
+{
+	if (flow->head == NULL)
+		flow->head = skb;
+	else
+		flow->tail->next = skb;
+	flow->tail = skb;
+	skb->next = NULL;
+}
+
+static unsigned int fq_codel_drop(struct Qdisc *sch)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+	unsigned int maxbacklog = 0, idx = 0, i, len;
+	struct fq_codel_flow *flow;
+
+	/* Queue is full! Find the fat flow and drop packet from it.
+	 * This might sound expensive, but with 1024 flows, we scan
+	 * 4KB of memory, and we dont need to handle a complex tree
+	 * in fast path (packet queue/enqueue) with many cache misses.
+	 */
+	for (i = 0; i < q->flows_cnt; i++) {
+		if (q->backlogs[i] > maxbacklog) {
+			maxbacklog = q->backlogs[i];
+			idx = i;
+		}
+	}
+	flow = &q->flows[idx];
+	skb = dequeue_head(flow);
+	len = qdisc_pkt_len(skb);
+	q->backlogs[idx] -= len;
+	kfree_skb(skb);
+	sch->q.qlen--;
+	sch->qstats.drops++;
+	sch->qstats.backlog -= len;
+	return idx;
+}
+
+static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	unsigned int idx;
+	struct fq_codel_flow *flow;
+	int uninitialized_var(ret);
+
+	idx = fq_codel_classify(skb, sch, &ret);
+	if (idx == 0) {
+		if (ret & __NET_XMIT_BYPASS)
+			sch->qstats.drops++;
+		kfree_skb(skb);
+		return ret;
+	}
+	idx--;
+
+	codel_set_enqueue_time(skb);
+	flow = &q->flows[idx];
+	flow_queue_add(flow, skb);
+	q->backlogs[idx] += qdisc_pkt_len(skb);
+	sch->qstats.backlog += qdisc_pkt_len(skb);
+
+	if (list_empty(&flow->flowchain)) {
+		list_add_tail(&flow->flowchain, &q->new_flows);
+		codel_vars_init(&flow->cvars);
+		q->new_flow_count++;
+		flow->deficit = q->quantum;
+	}
+	if (++sch->q.qlen < sch->limit)
+		return NET_XMIT_SUCCESS;
+
+	q->drop_overlimit++;
+	/* Return Congestion Notification only if we dropped a packet
+	 * from this flow.
+	 */
+	if (fq_codel_drop(sch) == idx)
+		return NET_XMIT_CN;
+
+	/* As we dropped a packet, better let upper stack know this */
+	qdisc_tree_decrease_qlen(sch, 1);
+	return NET_XMIT_SUCCESS;
+}
+
+/* This is the specific function called from codel_dequeue()
+ * to dequeue a packet from queue. Note: backlog is handled in
+ * codel, we dont need to reduce it here.
+ */
+static struct sk_buff *dequeue(struct codel_vars *vars, struct Qdisc *sch)
+{
+	struct fq_codel_flow *flow;
+	struct sk_buff *skb = NULL;
+
+	flow = container_of(vars, struct fq_codel_flow, cvars);
+	if (flow->head) {
+		skb = dequeue_head(flow);
+		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->q.qlen--;
+	}
+	return skb;
+}
+
+static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+	struct fq_codel_flow *flow;
+
+begin:
+	if (!list_empty(&q->new_flows))
+		flow = list_first_entry(&q->new_flows,
+					struct fq_codel_flow,
+					flowchain);
+	else if (!list_empty(&q->old_flows))
+		flow = list_first_entry(&q->old_flows,
+					struct fq_codel_flow,
+					flowchain);
+	else
+		return NULL;
+
+	if (flow->deficit <= 0) {
+		flow->deficit += q->quantum;
+		list_move_tail(&flow->flowchain, &q->old_flows);
+		goto begin;
+	}
+	skb = codel_dequeue(sch, &q->cparams, &flow->cvars, &q->cstats,
+			    dequeue, &q->backlogs[flow - q->flows]);
+	if (!skb) {
+		list_del_init(&flow->flowchain);
+		goto begin;
+	}
+	qdisc_bstats_update(sch, skb);
+	flow->deficit -= qdisc_pkt_len(skb);
+	return skb;
+}
+
+static void fq_codel_reset(struct Qdisc *sch)
+{
+	struct sk_buff *skb;
+
+	while ((skb = fq_codel_dequeue(sch)) != NULL)
+		kfree_skb(skb);
+}
+
+static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = {
+	[TCA_FQ_CODEL_TARGET]	= { .type = NLA_U32 },
+	[TCA_FQ_CODEL_LIMIT]	= { .type = NLA_U32 },
+	[TCA_FQ_CODEL_INTERVAL]	= { .type = NLA_U32 },
+	[TCA_FQ_CODEL_ECN]	= { .type = NLA_U32 },
+};
+
+static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_FQ_CODEL_MAX + 1];
+	int err;
+
+	if (!opt)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_FQ_CODEL_MAX, opt, fq_codel_policy);
+	if (err < 0)
+		return err;
+	if (tb[TCA_FQ_CODEL_FLOWS]) {
+		if (q->flows)
+			return -EINVAL;
+		q->flows_cnt = nla_get_u32(tb[TCA_FQ_CODEL_FLOWS]);
+		if (!q->flows_cnt ||
+		    q->flows_cnt > 65536)
+			return -EINVAL;
+	}
+	sch_tree_lock(sch);
+
+	if (tb[TCA_FQ_CODEL_TARGET]) {
+		u64 target = nla_get_u32(tb[TCA_FQ_CODEL_TARGET]);
+
+		q->cparams.target = (target * NSEC_PER_USEC) >> CODEL_SHIFT;
+	}
+
+	if (tb[TCA_FQ_CODEL_INTERVAL]) {
+		u64 interval = nla_get_u32(tb[TCA_FQ_CODEL_INTERVAL]);
+
+		q->cparams.interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT;
+	}
+
+	if (tb[TCA_FQ_CODEL_LIMIT])
+		sch->limit = nla_get_u32(tb[TCA_FQ_CODEL_LIMIT]);
+
+	if (tb[TCA_FQ_CODEL_ECN])
+		q->cparams.ecn = !!nla_get_u32(tb[TCA_FQ_CODEL_ECN]);
+
+	while (sch->q.qlen > sch->limit) {
+		struct sk_buff *skb = fq_codel_dequeue(sch);
+
+		kfree_skb(skb);
+		q->cstats.drop_count++;
+	}
+	qdisc_tree_decrease_qlen(sch, q->cstats.drop_count);
+	q->cstats.drop_count = 0;
+
+	sch_tree_unlock(sch);
+	return 0;
+}
+
+static void *fq_codel_zalloc(size_t sz)
+{
+	void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
+
+	if (!ptr)
+		ptr = vzalloc(sz);
+	return ptr;
+}
+
+static void fq_codel_free(void *addr)
+{
+	if (addr) {
+		if (is_vmalloc_addr(addr))
+			vfree(addr);
+		else
+			kfree(addr);
+	}
+}
+
+static void fq_codel_destroy(struct Qdisc *sch)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	fq_codel_free(q->backlogs);
+	fq_codel_free(q->flows);
+}
+
+static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	int i;
+
+	sch->limit = 10*1024;
+	q->flows_cnt = 1024;
+	q->quantum = psched_mtu(qdisc_dev(sch));
+	q->perturbation = net_random();
+	INIT_LIST_HEAD(&q->new_flows);
+	INIT_LIST_HEAD(&q->old_flows);
+	codel_params_init(&q->cparams);
+	codel_stats_init(&q->cstats);
+	q->cparams.ecn = true;
+
+	if (opt) {
+		int err = fq_codel_change(sch, opt);
+		if (err)
+			return err;
+	}
+
+	if (!q->flows) {
+		q->flows = fq_codel_zalloc(q->flows_cnt *
+					   sizeof(struct fq_codel_flow));
+		if (!q->flows)
+			return -ENOMEM;
+		q->backlogs = fq_codel_zalloc(q->flows_cnt * sizeof(u32));
+		if (!q->backlogs) {
+			fq_codel_free(q->flows);
+			return -ENOMEM;
+		}
+		for (i = 0; i < q->flows_cnt; i++) {
+			struct fq_codel_flow *flow = q->flows + i;
+
+			INIT_LIST_HEAD(&flow->flowchain);
+		}
+	}
+	if (sch->limit >= 1)
+		sch->flags |= TCQ_F_CAN_BYPASS;
+	else
+		sch->flags &= ~TCQ_F_CAN_BYPASS;
+	return 0;
+}
+
+static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts;
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, TCA_FQ_CODEL_TARGET,
+			codel_time_to_us(q->cparams.target)) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_LIMIT,
+			sch->limit) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_INTERVAL,
+			codel_time_to_us(q->cparams.interval)) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_ECN,
+			q->cparams.ecn) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_FLOWS,
+			q->flows_cnt))
+		goto nla_put_failure;
+
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -1;
+}
+
+static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct tc_fq_codel_xstats st = {
+		.type				= TCA_FQ_CODEL_XSTATS_QDISC,
+		.qdisc_stats.maxpacket		= q->cstats.maxpacket,
+		.qdisc_stats.drop_overlimit	= q->drop_overlimit,
+		.qdisc_stats.ecn_mark		= q->cstats.ecn_mark,
+		.qdisc_stats.new_flow_count	= q->new_flow_count,
+	};
+	struct list_head *pos;
+
+	list_for_each(pos, &q->new_flows)
+		st.qdisc_stats.new_flows_len++;
+
+	list_for_each(pos, &q->old_flows)
+		st.qdisc_stats.old_flows_len++;
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static struct Qdisc *fq_codel_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	return NULL;
+}
+
+static unsigned long fq_codel_get(struct Qdisc *sch, u32 classid)
+{
+	return 0;
+}
+
+static unsigned long fq_codel_bind(struct Qdisc *sch, unsigned long parent,
+			      u32 classid)
+{
+	/* we cannot bypass queue discipline anymore */
+	sch->flags &= ~TCQ_F_CAN_BYPASS;
+	return 0;
+}
+
+static void fq_codel_put(struct Qdisc *q, unsigned long cl)
+{
+}
+
+static struct tcf_proto **fq_codel_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static int fq_codel_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	return 0;
+}
+
+static int fq_codel_dump_class_stats(struct Qdisc *sch, unsigned long cl,
+				     struct gnet_dump *d)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	u32 idx = cl - 1;
+	struct gnet_stats_queue qs = { 0 };
+	struct tc_fq_codel_xstats xstats;
+
+	if (idx < q->flows_cnt) {
+		const struct fq_codel_flow *flow = &q->flows[idx];
+		const struct sk_buff *skb = flow->head;
+
+		memset(&xstats, 0, sizeof(xstats));
+		xstats.type = TCA_FQ_CODEL_XSTATS_CLASS;
+		xstats.class_stats.deficit = flow->deficit;
+		xstats.class_stats.ldelay =
+			codel_time_to_us(flow->cvars.ldelay);
+		xstats.class_stats.count = flow->cvars.count;
+		xstats.class_stats.lastcount = flow->cvars.lastcount;
+		xstats.class_stats.dropping = flow->cvars.dropping;
+		if (flow->cvars.dropping) {
+			codel_tdiff_t delta = flow->cvars.drop_next -
+					      codel_get_time();
+
+			xstats.class_stats.drop_next = (delta >= 0) ?
+				codel_time_to_us(delta) :
+				-codel_time_to_us(-delta);
+		}
+		while (skb) {
+			qs.qlen++;
+			skb = skb->next;
+		}
+		qs.backlog = q->backlogs[idx];
+	}
+	if (gnet_stats_copy_queue(d, &qs) < 0)
+		return -1;
+	if (idx < q->flows_cnt)
+		return gnet_stats_copy_app(d, &xstats, sizeof(xstats));
+	return 0;
+}
+
+static void fq_codel_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	unsigned int i;
+
+	if (arg->stop)
+		return;
+
+	for (i = 0; i < q->flows_cnt; i++) {
+		if (list_empty(&q->flows[i].flowchain) ||
+		    arg->count < arg->skip) {
+			arg->count++;
+			continue;
+		}
+		if (arg->fn(sch, i + 1, arg) < 0) {
+			arg->stop = 1;
+			break;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops fq_codel_class_ops = {
+	.leaf		=	fq_codel_leaf,
+	.get		=	fq_codel_get,
+	.put		=	fq_codel_put,
+	.tcf_chain	=	fq_codel_find_tcf,
+	.bind_tcf	=	fq_codel_bind,
+	.unbind_tcf	=	fq_codel_put,
+	.dump		=	fq_codel_dump_class,
+	.dump_stats	=	fq_codel_dump_class_stats,
+	.walk		=	fq_codel_walk,
+};
+
+static struct Qdisc_ops fq_codel_qdisc_ops __read_mostly = {
+	.cl_ops		=	&fq_codel_class_ops,
+	.id		=	"fq_codel",
+	.priv_size	=	sizeof(struct fq_codel_sched_data),
+	.enqueue	=	fq_codel_enqueue,
+	.dequeue	=	fq_codel_dequeue,
+	.peek		=	qdisc_peek_dequeued,
+	.drop		=	fq_codel_drop,
+	.init		=	fq_codel_init,
+	.reset		=	fq_codel_reset,
+	.destroy	=	fq_codel_destroy,
+	.change		=	NULL,
+	.dump		=	fq_codel_dump,
+	.dump_stats =	fq_codel_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init fq_codel_module_init(void)
+{
+	return register_qdisc(&fq_codel_qdisc_ops);
+}
+
+static void __exit fq_codel_module_exit(void)
+{
+	unregister_qdisc(&fq_codel_qdisc_ops);
+}
+
+module_init(fq_codel_module_init)
+module_exit(fq_codel_module_exit)
+MODULE_AUTHOR("Eric Dumazet");
+MODULE_LICENSE("GPL");

^ permalink raw reply related

* Re: [PATCH 08/17] net: Introduce sk_allocation() to allow addition of GFP flags depending on the individual socket
From: Mel Gorman @ 2012-05-11 14:12 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
	michaelc, emunson
In-Reply-To: <20120511.004949.655300373402132371.davem@davemloft.net>

On Fri, May 11, 2012 at 12:49:49AM -0400, David Miller wrote:
> From: Mel Gorman <mgorman@suse.de>
> Date: Thu, 10 May 2012 14:45:01 +0100
> 
> > Introduce sk_allocation(), this function allows to inject sock specific
> > flags to each sock related allocation. It is only used on allocation
> > paths that may be required for writing pages back to network storage.
> > 
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> 
> This is still a little bit more than it needs to be.
> 
> You are trying to propagate a single bit from sk->sk_allocation into
> all of the annotated socket memory allocation sites.
> 
> But many of them use sk->sk_allocation already.  In fact all of them
> that use a variable rather than a constant GFP_* satisfy this
> invariant.
> 
> All of those annotations are therefore spurious, and probably end up
> generating unnecessary |'s in of that special bit in at least some
> cases.
> 

Yes, you're completely correct here.

> What you really, therefore, care about are the GFP_FOO cases.  And in
> fact those are all GFP_ATOMIC.  So make something that says what it
> is that you want, a GFP_ATOMIC with some socket specified bits |'d
> in.
> 
> Something like this:
> 
> static inline gfp_t sk_gfp_atomic(struct sock *sk)
> {
> 	return GFP_ATOMIC | (sk->sk_allocation & __GFP_MEMALLOC);
> }
> 

I went with this.

> You'll also have to make your networking patches conform to the
> networking subsystem coding style.
> 
> For example:
> 
> > -	skb = sock_wmalloc(sk, MAX_TCP_HEADER + 15 + s_data_desired, 1, GFP_ATOMIC);
> > +	skb = sock_wmalloc(sk, MAX_TCP_HEADER + 15 + s_data_desired, 1,
> > +					sk_allocation(sk, GFP_ATOMIC));
> 
> The sk_allocation() argument has to line up with the first column
> after the openning parenthesis of the function call.  You can't just
> use all TAB characters.  And this all TABs thing looks extremely ugly
> to boot.
> 

I was not aware of the networking subsystem coding style. I'll fix it
up.

> > -		newnp->pktoptions = skb_clone(treq->pktopts, GFP_ATOMIC);
> > +		newnp->pktoptions = skb_clone(treq->pktopts,
> > +						sk_allocation(sk, GFP_ATOMIC));
> 
> Same here.
> 
> What's really funny to me is that in several cases elsewhere in this
> pach you get it right.

Whether I got it right or not would be effectively random. I tried
myself to see what pattern I was using thinking it would be "always"
tab but nope, no pattern :)

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox