Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 13/15] batman-adv: fix misspelled words
From: Antonio Quartulli @ 2015-01-08 15:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Antonio Quartulli, Marek Lindner
In-Reply-To: <1420730120-9844-1-git-send-email-antonio@meshcoding.com>

Reported-by: checkpatch
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
---
 net/batman-adv/bridge_loop_avoidance.c | 4 ++--
 net/batman-adv/main.c                  | 2 +-
 net/batman-adv/packet.h                | 4 ++--
 net/batman-adv/routing.c               | 2 +-
 net/batman-adv/translation-table.c     | 2 +-
 net/batman-adv/types.h                 | 2 +-
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c
index 4fc6cab..ac4b96e 100644
--- a/net/batman-adv/bridge_loop_avoidance.c
+++ b/net/batman-adv/bridge_loop_avoidance.c
@@ -244,7 +244,7 @@ batadv_bla_del_backbone_claims(struct batadv_bla_backbone_gw *backbone_gw)
 		spin_unlock_bh(list_lock);
 	}
 
-	/* all claims gone, intialize CRC */
+	/* all claims gone, initialize CRC */
 	backbone_gw->crc = BATADV_BLA_CRC_INIT;
 }
 
@@ -1328,7 +1328,7 @@ int batadv_bla_check_bcast_duplist(struct batadv_priv *bat_priv,
 		goto out;
 	}
 	/* not found, add a new entry (overwrite the oldest entry)
-	 * and allow it, its the first occurence.
+	 * and allow it, its the first occurrence.
 	 */
 	curr = (bat_priv->bla.bcast_duplist_curr + BATADV_DUPLIST_SIZE - 1);
 	curr %= BATADV_DUPLIST_SIZE;
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index 3bcd847..12fc77b 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -798,7 +798,7 @@ void batadv_tvlv_container_register(struct batadv_priv *bat_priv,
 }
 
 /**
- * batadv_tvlv_realloc_packet_buff - reallocate packet buffer to accomodate
+ * batadv_tvlv_realloc_packet_buff - reallocate packet buffer to accommodate
  *  requested packet size
  * @packet_buff: packet buffer
  * @packet_buff_len: packet buffer size
diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h
index facd1fe..b81fbbf 100644
--- a/net/batman-adv/packet.h
+++ b/net/batman-adv/packet.h
@@ -377,7 +377,7 @@ struct batadv_frag_packet {
 	uint8_t reserved:4;
 	uint8_t no:4;
 #else
-#error "unknown bitfield endianess"
+#error "unknown bitfield endianness"
 #endif
 	uint8_t dest[ETH_ALEN];
 	uint8_t orig[ETH_ALEN];
@@ -453,7 +453,7 @@ struct batadv_coded_packet {
  * @src: address of the source
  * @dst: address of the destination
  * @tvlv_len: length of tvlv data following the unicast tvlv header
- * @align: 2 bytes to align the header to a 4 byte boundry
+ * @align: 2 bytes to align the header to a 4 byte boundary
  */
 struct batadv_unicast_tvlv_packet {
 	uint8_t  packet_type;
diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
index 139d2f6..da83982 100644
--- a/net/batman-adv/routing.c
+++ b/net/batman-adv/routing.c
@@ -456,7 +456,7 @@ batadv_find_router(struct batadv_priv *bat_priv,
 	 * the last chosen bonding candidate (next_candidate). If no such
 	 * router is found, use the first candidate found (the previously
 	 * chosen bonding candidate might have been the last one in the list).
-	 * If this can't be found either, return the previously choosen
+	 * If this can't be found either, return the previously chosen
 	 * router - obviously there are no other candidates.
 	 */
 	rcu_read_lock();
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index 84e6f01..07b263a 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -2852,7 +2852,7 @@ static void batadv_tt_update_changes(struct batadv_priv *bat_priv,
 /**
  * batadv_is_my_client - check if a client is served by the local node
  * @bat_priv: the bat priv with all the soft interface information
- * @addr: the mac adress of the client to check
+ * @addr: the mac address of the client to check
  * @vid: VLAN identifier
  *
  * Returns true if the client is served by this node, false otherwise.
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
index 462a70c..9398c3f 100644
--- a/net/batman-adv/types.h
+++ b/net/batman-adv/types.h
@@ -968,7 +968,7 @@ struct batadv_tt_orig_list_entry {
 };
 
 /**
- * struct batadv_tt_change_node - structure for tt changes occured
+ * struct batadv_tt_change_node - structure for tt changes occurred
  * @list: list node for batadv_priv_tt::changes_list
  * @change: holds the actual translation table diff data
  */
-- 
2.2.1

^ permalink raw reply related

* [PATCH 11/15] batman-adv: checkpatch - remove unnecessary parentheses
From: Antonio Quartulli @ 2015-01-08 15:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Antonio Quartulli, Marek Lindner
In-Reply-To: <1420730120-9844-1-git-send-email-antonio@meshcoding.com>

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
---
 net/batman-adv/bat_iv_ogm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 4dd7c40..00e00e0 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -878,7 +878,7 @@ batadv_iv_ogm_slide_own_bcast_window(struct batadv_hard_iface *hard_iface)
 		hlist_for_each_entry_rcu(orig_node, head, hash_entry) {
 			spin_lock_bh(&orig_node->bat_iv.ogm_cnt_lock);
 			word_index = hard_iface->if_num * BATADV_NUM_WORDS;
-			word = &(orig_node->bat_iv.bcast_own[word_index]);
+			word = &orig_node->bat_iv.bcast_own[word_index];
 
 			batadv_bit_get_packet(bat_priv, word, 1, 0);
 			if_num = hard_iface->if_num;
@@ -1663,7 +1663,7 @@ static void batadv_iv_ogm_process(const struct sk_buff *skb, int ogm_offset,
 			offset = if_num * BATADV_NUM_WORDS;
 
 			spin_lock_bh(&orig_neigh_node->bat_iv.ogm_cnt_lock);
-			word = &(orig_neigh_node->bat_iv.bcast_own[offset]);
+			word = &orig_neigh_node->bat_iv.bcast_own[offset];
 			bit_pos = if_incoming_seqno - 2;
 			bit_pos -= ntohl(ogm_packet->seqno);
 			batadv_set_bit(word, bit_pos);
-- 
2.2.1

^ permalink raw reply related

* [PATCH 12/15] batman-adv: clear control block of received socket buffers
From: Antonio Quartulli @ 2015-01-08 15:15 UTC (permalink / raw)
  To: davem
  Cc: netdev, b.a.t.m.a.n, Martin Hundebøll, Marek Lindner,
	Antonio Quartulli
In-Reply-To: <1420730120-9844-1-git-send-email-antonio@meshcoding.com>

From: Martin Hundebøll <martin@hundeboll.net>

Since other network components (and some drivers) uses the control block
provided in skb's, the network coding feature might wrongly assume that
an SKB has been decoded, and thus not try to code it with another packet
again. This happens for instance when batman-adv is running on a bridge device.

Fix this by clearing the control block for every received SKB.

Introduced by 3c12de9a5c756b23fe7c9ab332474ece1568914c
("batman-adv: network coding - code and transmit packets if possible")
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
---
 net/batman-adv/main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index d4079cd..3bcd847 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -402,6 +402,9 @@ int batadv_batman_skb_recv(struct sk_buff *skb, struct net_device *dev,
 		goto err_free;
 	}
 
+	/* reset control block to avoid left overs from previous users */
+	memset(skb->cb, 0, sizeof(struct batadv_skb_cb));
+
 	/* all receive handlers return whether they received or reused
 	 * the supplied skb. if not, we have to free the skb.
 	 */
-- 
2.2.1

^ permalink raw reply related

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
From: Sébastien Barré @ 2015-01-08 15:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Gregory Detal, Nandita Dukkipati,
	Yuchung Cheng, Neal Cardwell
In-Reply-To: <1420730396.5947.55.camel@edumazet-glaptop2.roam.corp.google.com>


Le 08/01/2015 16:19, Eric Dumazet a écrit :
> On Thu, 2015-01-08 at 13:20 +0100, Sébastien Barré wrote:
>> When the peer has delayed ack enabled, it may reply to a probe with an
>> ACK+D-SACK, with ack value set to tlp_high_seq. In the current code,
>> such ACK+DSACK will be missed and only at next, higher ack will the TLP
>> episode be considered done. Since the DSACK is not present anymore,
>> this will cost a cwnd reduction.
>>
> ...
>>   net/ipv4/tcp_input.c | 30 +++++++++++++-----------------
>>   1 file changed, 13 insertions(+), 17 deletions(-)
>>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 075ab4d..cf63a29 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -3363,29 +3363,25 @@ static void tcp_replace_ts_recent(struct tcp_sock *tp, u32 seq)
>>   static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag)
>>   {
>>   	struct tcp_sock *tp = tcp_sk(sk);
>> -	bool is_tlp_dupack = (ack == tp->tlp_high_seq) &&
>> -			     !(flag & (FLAG_SND_UNA_ADVANCED |
>> -				       FLAG_NOT_DUP | FLAG_DATA_SACKED));
>>   
> No idea why you removed this bool, it really helped to understand the
> code. This makes your patch looks more complex than needed.
I did not remove it in v1 
(http://www.spinics.net/lists/netdev/msg308653.html)
The removal was a request from Neal 
(http://www.spinics.net/lists/netdev/msg308758.html)

I think he found it special to have part of the logic apart, in a bool, 
and part of it in the if condition.
One possible option is to restore is_tlp_dupack and include the DSACK 
check in it, although this will
not do much more than moving complexity in the bool definition. But 
indeed, that might make
the patch more readable.

What do you and Neal think ?

regards,

Sébastien.
>
>>   	/* Mark the end of TLP episode on receiving TLP dupack or when
>>   	 * ack is after tlp_high_seq.
>> +	 * With delayed acks, we may also get a regular ACK+DSACK, in which
>> +	 * case we don't want to reduce the cwnd either.
>>   	 */
>> -	if (is_tlp_dupack) {
>> +	if (((ack == tp->tlp_high_seq) &&
>> +	     !(flag & (FLAG_SND_UNA_ADVANCED |
>> +		       FLAG_NOT_DUP | FLAG_DATA_SACKED))) ||
>> +	    (!before(ack, tp->tlp_high_seq) && (flag & FLAG_DSACKING_ACK))) {
>>   		tp->tlp_high_seq = 0;
>> -		return;
>> -	}
>

^ permalink raw reply

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
From: Neal Cardwell @ 2015-01-08 15:43 UTC (permalink / raw)
  To: Sébastien Barré
  Cc: Eric Dumazet, David Miller, Netdev, Gregory Detal,
	Nandita Dukkipati, Yuchung Cheng
In-Reply-To: <54AEA11A.1000603@uclouvain.be>

> Le 08/01/2015 16:07, Eric Dumazet a écrit :
>> Do you have at hand a packetdrill test to demonstrate that the patch
>> works ?

I cooked up the packetdrill test below when Sebastien sent out his v1
a few weeks ago. It fails on a kernel without his patch, and passes on
a kernel with his patch.

The code change looks fine to me, but if Eric prefers that the
expression be assigned to a bool before the check, that also sounds
fine to me.

neal


------------
// Establish a connection.
0     socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0     setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0    bind(3, ..., ...) = 0
+0    listen(3, 1) = 0

+0    < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.020 < . 1:1(0) ack 1 win 257
+0    accept(3, ..., ...) = 4

// Send 1 packet.
+0    write(4, ..., 1000) = 1000
+0    > P. 1:1001(1000) ack 1

// Loss probe retransmission.
// packets_out == 1 => schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
// In this case, this means: 1.5*RTT + 200ms = 230ms
+.230 > P. 1:1001(1000) ack 1
+0    %{ assert tcpi_snd_cwnd == 10 }%

// Receiver ACKs at tlp_high_seq with a DSACK,
// indicating they received the original packet and probe.
+.020 < . 1:1(0) ack 1001 win 257 <sack 1:1001,nop,nop>
+0    %{ assert tcpi_snd_cwnd == 10 }%

// Send another packet.
+0    write(4, ..., 1000) = 1000
+0    > P. 1001:2001(1000) ack 1

// Receiver ACKs above tlp_high_seq, which should end the TLP episode
// if we haven't already. We should not reduce cwnd.
+.020 < . 1:1(0) ack 2001 win 257
+0    %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%

^ permalink raw reply

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
From: Neal Cardwell @ 2015-01-08 15:49 UTC (permalink / raw)
  To: Sébastien Barré
  Cc: Eric Dumazet, David Miller, Netdev, Gregory Detal,
	Nandita Dukkipati, Yuchung Cheng
In-Reply-To: <54AEA498.5030202@uclouvain.be>

On Thu, Jan 8, 2015 at 10:39 AM, Sébastien Barré
<sebastien.barre@uclouvain.be> wrote:
> What do you and Neal think ?

My preference is to have the whole expression detecting the case where
the receiver got the probe packet encoded in a single expression. I
don't have a strong feeling about whether it should be stored in a
bool (to reduce the size of the diff) or written directly into the if
() expression (to reduce the size of the code). I'll defer to Eric on
which he thinks is better. :-)

neal

^ permalink raw reply

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
From: Neal Cardwell @ 2015-01-08 15:52 UTC (permalink / raw)
  To: Sébastien Barré
  Cc: Eric Dumazet, David Miller, Netdev, Gregory Detal,
	Nandita Dukkipati, Yuchung Cheng
In-Reply-To: <CADVnQy=A57TUYD=kGtZXvq7-FGSN=vN3D2heaNCnBB=zjujwUg@mail.gmail.com>

On Thu, Jan 8, 2015 at 10:49 AM, Neal Cardwell <ncardwell@google.com> wrote:
> On Thu, Jan 8, 2015 at 10:39 AM, Sébastien Barré
> <sebastien.barre@uclouvain.be> wrote:
>> What do you and Neal think ?
>
> My preference is to have the whole expression detecting the case where
> the receiver got the probe packet encoded in a single expression.

If we do store it in a bool, then "is_tlp_dupack" is no longer quite
accurate (we're expanding the check to include ACKs that are not
dupacks), so I'd suggest a name like "is_probe_rcvd".

neal

^ permalink raw reply

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
From: Eric Dumazet @ 2015-01-08 15:59 UTC (permalink / raw)
  To: Sébastien Barré
  Cc: David Miller, netdev, Gregory Detal, Nandita Dukkipati,
	Yuchung Cheng, Neal Cardwell
In-Reply-To: <54AEA11A.1000603@uclouvain.be>

On Thu, 2015-01-08 at 16:24 +0100, Sébastien Barré wrote:
> Hi Eric,
> 
> Le 08/01/2015 16:07, Eric Dumazet a écrit :
> >
> >>   In the current code,
> >> such ACK+DSACK will be missed and only at next, higher ack will the TLP
> >> episode be considered done. Since the DSACK is not present anymore,
> >> this will cost a cwnd reduction.
> >>
> >> This patch ensures that this scenario does not cause a cwnd reduction, since
> >> receiving an ACK+DSACK indicates that both the initial segment and the probe
> >> have been received by the peer.
> > Do you have at hand a packetdrill test to demonstrate that the patch
> > works ?
> Not currently, but that would be very good indeed.
> I will prepare one and send it.

Here is a skeleton you might adapt :

// TLP test when a sender does not have any new segments for transmission.
// In the absence of ACKs, the sender retransmits an old segment as a probe
// after 2 RTTs.

// Tolerate RTO variations
--tolerance_usecs=15000

// Set up production config.
`../common/defaults.sh
sysctl -q net.ipv4.tcp_early_retrans=3
`

// Establish a connection.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100  < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0     > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+0.100 < . 1:1(0) ack 1 win 257
+0     accept(3, ..., ...) = 4

// Send 10 MSS.
+0.100 write(4, ..., 10000) = 10000
+0 > . 1:5001(5000) ack 1
+0 > P. 5001:10001(5000) ack 1

// TLP retransmission in 2 RTTs.
+0.200 > P. 9001:10001(1000) ack 1
+0 %{
assert tcpi_snd_cwnd == 10, 'tcpi_snd_cwnd=%d' % tcpi_snd_cwnd
assert tcpi_unacked == 10
}%

+0.100 < . 1:1(0) ack 10001 win 257 <sack 9001:1001,nop,nop>

+0 %{
assert tcpi_snd_cwnd == 20, 'tcpi_snd_cwnd=%d' % tcpi_snd_cwnd
assert tcpi_unacked == 0, 'tcpi_unacked=%d' % tcpi_unacked
}%

^ permalink raw reply

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
From: Sébastien Barré @ 2015-01-08 16:00 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Eric Dumazet, David Miller, Netdev, Gregory Detal,
	Nandita Dukkipati, Yuchung Cheng
In-Reply-To: <CADVnQy=kACmKGkakx85BHWg7WppS3-B1yLa-TXvfBKbXtXYJ+A@mail.gmail.com>


Le 08/01/2015 16:43, Neal Cardwell a écrit :
>> Le 08/01/2015 16:07, Eric Dumazet a écrit :
>>> Do you have at hand a packetdrill test to demonstrate that the patch
>>> works ?
> I cooked up the packetdrill test below when Sebastien sent out his v1
> a few weeks ago. It fails on a kernel without his patch, and passes on
> a kernel with his patch.
Thanks a lot for this ! I will include it in our test suite then.
I understand that there is convergence on having a
bool called is_probe_rcvd with the whole logic.
Eric, is this ok for you ?

Thanks,

Sébastien.
>
> The code change looks fine to me, but if Eric prefers that the
> expression be assigned to a bool before the check, that also sounds
> fine to me.
>
> neal
>
>
> ------------
> // Establish a connection.
> 0     socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0     setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0    bind(3, ..., ...) = 0
> +0    listen(3, 1) = 0
>
> +0    < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
> +0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
> +.020 < . 1:1(0) ack 1 win 257
> +0    accept(3, ..., ...) = 4
>
> // Send 1 packet.
> +0    write(4, ..., 1000) = 1000
> +0    > P. 1:1001(1000) ack 1
>
> // Loss probe retransmission.
> // packets_out == 1 => schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
> // In this case, this means: 1.5*RTT + 200ms = 230ms
> +.230 > P. 1:1001(1000) ack 1
> +0    %{ assert tcpi_snd_cwnd == 10 }%
>
> // Receiver ACKs at tlp_high_seq with a DSACK,
> // indicating they received the original packet and probe.
> +.020 < . 1:1(0) ack 1001 win 257 <sack 1:1001,nop,nop>
> +0    %{ assert tcpi_snd_cwnd == 10 }%
>
> // Send another packet.
> +0    write(4, ..., 1000) = 1000
> +0    > P. 1001:2001(1000) ack 1
>
> // Receiver ACKs above tlp_high_seq, which should end the TLP episode
> // if we haven't already. We should not reduce cwnd.
> +.020 < . 1:1(0) ack 2001 win 257
> +0    %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%

^ permalink raw reply

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
From: Eric Dumazet @ 2015-01-08 16:25 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Sébastien Barré, David Miller, Netdev, Gregory Detal,
	Nandita Dukkipati, Yuchung Cheng
In-Reply-To: <CADVnQy=A57TUYD=kGtZXvq7-FGSN=vN3D2heaNCnBB=zjujwUg@mail.gmail.com>

On Thu, 2015-01-08 at 10:49 -0500, Neal Cardwell wrote:
> On Thu, Jan 8, 2015 at 10:39 AM, Sébastien Barré
> <sebastien.barre@uclouvain.be> wrote:
> > What do you and Neal think ?
> 
> My preference is to have the whole expression detecting the case where
> the receiver got the probe packet encoded in a single expression. I
> don't have a strong feeling about whether it should be stored in a
> bool (to reduce the size of the diff) or written directly into the if
> () expression (to reduce the size of the code). I'll defer to Eric on
> which he thinks is better. :-)

There is no shame using helpers with nice names to help understand this
TCP stack. Even if the helper is used exactly once.

In this case, it seems we test 2 different conditions, so this could use
2 helpers with self describing names.

When I see : 

> +     if (((ack == tp->tlp_high_seq) &&
> +          !(flag & (FLAG_SND_UNA_ADVANCED |
> +                    FLAG_NOT_DUP | FLAG_DATA_SACKED))) ||
> +         (!before(ack, tp->tlp_high_seq) && (flag & FLAG_DSACKING_ACK))) {
>               tp->tlp_high_seq = 0;

My brain hurts.

^ permalink raw reply

* [PATCH net-next] packet: make packet too small warning match condition
From: Willem de Bruijn @ 2015-01-08 16:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, jkmalinen, dborkman, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

The expression in ll_header_truncated() tests less than or equal, but
the warning prints less than. Update the warning.

Reported-by: Jouni Malinen <jkmalinen@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/packet/af_packet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 6880f34..0f02668 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2102,7 +2102,7 @@ static bool ll_header_truncated(const struct net_device *dev, int len)
 {
 	/* net device doesn't like empty head */
 	if (unlikely(len <= dev->hard_header_len)) {
-		net_warn_ratelimited("%s: packet size is too short (%d < %d)\n",
+		net_warn_ratelimited("%s: packet size is too short (%d <= %d)\n",
 				     current->comm, len, dev->hard_header_len);
 		return true;
 	}
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related

* Re: [RFC net-next 0/3] devconf: New infrastructure for setting pre-load parameters for a module
From: Greg KH @ 2015-01-08 16:46 UTC (permalink / raw)
  To: Hadar Hen Zion
  Cc: David S. Miller, netdev, Amir Vadai, Yevgeny Petrilin, Or Gerlitz,
	shannon.nelson, Doug Ledford, greearb
In-Reply-To: <1420730220-20224-1-git-send-email-hadarh@mellanox.com>

On Thu, Jan 08, 2015 at 05:16:57PM +0200, Hadar Hen Zion wrote:
> Hi,
>     
> When configuring a device at an early boot stage, most kernel drivers
> use module parameters (the parameters' settings can be determined in
> modprobe.d config files).

Which is a bad idea, as you have learned :)

> These parameters are difficult to manage, and one of the reasons is that
> module parameters are set per driver and not per device (NICs using the
> same driver cannot be set with different configurations).
> Furthermore, using other existing configuration tools like ethtool,
> ifconfig, ip link commands or sysfs entries is not applicable, since
> they all rely on having a netdev already set up.
> 
> In the past, 'request_firmware' solution for configuration parameters
> was suggested by Shannon Nelson from Intel[1]. The idea was rejected by
> Greg KH, who claimed it was abusive of the request_firmware mechanism.
> Greg suggested using configfs for device configuration instead (as done
> by the USB gadget driver).
> 
> As a solution, we introduce a new kernel infrastructure using configfs
> to allow the configuration of the device. The goal is to set low-level
> device functionality that needs to be sorted out before a module is
> loaded.

Ick, really?  drivers should never need to be configured like this, if
so, then something is wrong, they should "just work" by default.  What
are you needing to "configure" that can't be determined by something
like a device tree?

greg k-h

^ permalink raw reply

* Re: [RFC net-next 0/3] devconf: New infrastructure for setting pre-load parameters for a module
From: Amir Vadai @ 2015-01-08 17:11 UTC (permalink / raw)
  To: Greg KH, Hadar Hen Zion
  Cc: David S. Miller, netdev@vger.kernel.org, Yevgeny Petrilin,
	Or Gerlitz, shannon.nelson@intel.com, Doug Ledford,
	greearb@candelatech.com
In-Reply-To: <20150108164618.GA11696@kroah.com>

On 1/8/2015 6:46 PM, Greg KH wrote:
> On Thu, Jan 08, 2015 at 05:16:57PM +0200, Hadar Hen Zion wrote:
>> Hi,
>>     
>> When configuring a device at an early boot stage, most kernel drivers
>> use module parameters (the parameters' settings can be determined in
>> modprobe.d config files).
> 
> Which is a bad idea, as you have learned :)
> 
>> These parameters are difficult to manage, and one of the reasons is that
>> module parameters are set per driver and not per device (NICs using the
>> same driver cannot be set with different configurations).
>> Furthermore, using other existing configuration tools like ethtool,
>> ifconfig, ip link commands or sysfs entries is not applicable, since
>> they all rely on having a netdev already set up.
>>
>> In the past, 'request_firmware' solution for configuration parameters
>> was suggested by Shannon Nelson from Intel[1]. The idea was rejected by
>> Greg KH, who claimed it was abusive of the request_firmware mechanism.
>> Greg suggested using configfs for device configuration instead (as done
>> by the USB gadget driver).
>>
>> As a solution, we introduce a new kernel infrastructure using configfs
>> to allow the configuration of the device. The goal is to set low-level
>> device functionality that needs to be sorted out before a module is
>> loaded.
> 
> Ick, really?  drivers should never need to be configured like this, if
> so, then something is wrong, they should "just work" by default.  What
> are you needing to "configure" that can't be determined by something
> like a device tree?
Ick indeed - but we can't find anything better.

For example, we have devices that can act as either netdev or as an
Infiniband device.
The driver consists of a core to handle the hardware, and higher layer
drivers - one for Ethernet and one for Infiniband.
Today the selection is done through a module parameter. according to it
the relevant higher level driver is loaded, and the device is
initialized. You don't want to have a default of netdev, and in every
installation that needs Infiniband, a netdev will be created, removed
and only then the Infiniband device will appear.
This is only one example to configuration that needs to be known before
the hardware is initialized, and be persistent across boots.

We can have a 2 stages loading. First load the core, wait for user
input, and only then configure the device and load the right upper layer
driver according to the user input (configfs/sysfs).
Since other vendors seems to need this capability too, we thought it
would be better to make it generic - and this is this what this RFC is
about.

Amir

> 
> greg k-h
> 

^ permalink raw reply

* Loan Offer at 3%
From: Cashland Financial Service @ 2015-01-08 16:57 UTC (permalink / raw)
  To: Recipients

Loan Offer at 3%, Feel Free to REPLY back to us for more info.

^ permalink raw reply

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
From: Eric Dumazet @ 2015-01-08 17:17 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Sébastien Barré, David Miller, Netdev, Gregory Detal,
	Nandita Dukkipati, Yuchung Cheng
In-Reply-To: <1420734325.5947.61.camel@edumazet-glaptop2.roam.corp.google.com>

On Thu, 2015-01-08 at 08:25 -0800, Eric Dumazet wrote:
> On Thu, 2015-01-08 at 10:49 -0500, Neal Cardwell wrote:
> > On Thu, Jan 8, 2015 at 10:39 AM, Sébastien Barré
> > <sebastien.barre@uclouvain.be> wrote:
> > > What do you and Neal think ?
> > 
> > My preference is to have the whole expression detecting the case where
> > the receiver got the probe packet encoded in a single expression. I
> > don't have a strong feeling about whether it should be stored in a
> > bool (to reduce the size of the diff) or written directly into the if
> > () expression (to reduce the size of the code). I'll defer to Eric on
> > which he thinks is better. :-)
> 
> There is no shame using helpers with nice names to help understand this
> TCP stack. Even if the helper is used exactly once.
> 
> In this case, it seems we test 2 different conditions, so this could use
> 2 helpers with self describing names.
> 
> When I see : 
> 
> > +     if (((ack == tp->tlp_high_seq) &&
> > +          !(flag & (FLAG_SND_UNA_ADVANCED |
> > +                    FLAG_NOT_DUP | FLAG_DATA_SACKED))) ||
> > +         (!before(ack, tp->tlp_high_seq) && (flag & FLAG_DSACKING_ACK))) {
> >               tp->tlp_high_seq = 0;
> 
> My brain hurts.

BTW, tlp_high_seq is used no matter what (initial value is 0).

While the probability ACK seq being exactly 0 is small (1 out of 2^32),
probability that !before(ack, tp->tlp_high_seq) being a false positive
is very high. Initial sequence number are supposed to be random.

I wonder if setting tp->tlp_high_seq to 0 is the right thing to do.

^ permalink raw reply

* Re: [PATCH net-next v2] tcp: avoid reducing cwnd when ACK+DSACK is received
From: Eric Dumazet @ 2015-01-08 17:27 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Sébastien Barré, David Miller, Netdev, Gregory Detal,
	Nandita Dukkipati, Yuchung Cheng
In-Reply-To: <1420737469.5947.66.camel@edumazet-glaptop2.roam.corp.google.com>

On Thu, 2015-01-08 at 09:17 -0800, Eric Dumazet wrote:

> BTW, tlp_high_seq is used no matter what (initial value is 0).
> 
> While the probability ACK seq being exactly 0 is small (1 out of 2^32),
> probability that !before(ack, tp->tlp_high_seq) being a false positive
> is very high. Initial sequence number are supposed to be random.
> 

Oh well, calls to tcp_process_tlp_ack() are guarded by :

if (tp->tlp_high_seq)
    tcp_process_tlp_ack(sk, ack, flag);

^ permalink raw reply

* [PATCH net 1/1] tipc: fix bug in broadcast retransmit code
From: Jon Maloy @ 2015-01-08 17:27 UTC (permalink / raw)
  To: davem
  Cc: netdev, Paul Gortmaker, erik.hugne, ying.xue, maloy,
	tipc-discussion, Jon Maloy

In commit 58dc55f25631178ee74cd27185956a8f7dcb3e32 ("tipc: use generic
SKB list APIs to manage link transmission queue") we replace all list
traversal loops with the macros skb_queue_walk() or
skb_queue_walk_safe(). While the previous loops were based on the
assumption that the list was NULL-terminated, the standard macros
stop when the iterator reaches the list head, which is non-NULL.

In the function bclink_retransmit_pkt() this macro replacement has
lead to a bug. When we receive a BCAST STATE_MSG we unconditionally
call the function bclink_retransmit_pkt(), whether there really is
anything to retransmit or not, assuming that the sequence number
comparisons will lead to the correct behavior. However, if the
transmission queue is empty, or if there are no eligible buffers in
the transmission queue, we will by mistake pass the list head pointer
to the function tipc_link_retransmit(). Since the list head is not a
valid sk_buff, this leads to a crash.

In this commit we fix this by only calling tipc_link_retransmit()
if we actually found eligible buffers in the transmission queue.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/bcast.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 96ceefe..a9e174f 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -220,10 +220,11 @@ static void bclink_retransmit_pkt(u32 after, u32 to)
 	struct sk_buff *skb;

 	skb_queue_walk(&bcl->outqueue, skb) {
-		if (more(buf_seqno(skb), after))
+		if (more(buf_seqno(skb), after)) {
+			tipc_link_retransmit(bcl, skb, mod(to - after));
 			break;
+		}
 	}
-	tipc_link_retransmit(bcl, skb, mod(to - after));
 }

 /**
-- 
1.9.1

^ permalink raw reply related

* Re: [net-next PATCH v1 02/11] net: flow_table: add flow, delete flow
From: Jiri Pirko @ 2015-01-08 17:39 UTC (permalink / raw)
  To: John Fastabend; +Cc: tgraf, sfeldma, jhs, simon.horman, netdev, davem, andy
In-Reply-To: <20141231194615.31070.18038.stgit@nitbit.x32>

Wed, Dec 31, 2014 at 08:46:16PM CET, john.fastabend@gmail.com wrote:
>Now that the device capabilities are exposed we can add support to
>add and delete flows from the tables.
>
>The two operations are
>
>table_set_flows :
>
>  The set flow operations is used to program a set of flows into a
>  hardware device table. The message is consumed via netlink encoded
>  message which is then decoded into a null terminated  array of
>  flow entry structures. A flow entry structure is defined as
>
>     struct net_flow_flow {
>			  int table_id;
>			  int uid;
>			  int priority;
>			  struct net_flow_field_ref *matches;
>			  struct net_flow_action *actions;
>     }
>
>  The table id is the _uid_ returned from 'get_tables' operatoins.
>  Matches is a set of match criteria for packets with a logical AND
>  operation done on the set so packets match the entire criteria.
>  Actions provide a set of actions to perform when the flow rule is
>  hit. Both matches and actions are null terminated arrays.
>
>  The flows are configured in hardware using an ndo op. We do not
>  provide a commit operation at the moment and expect hardware
>  commits the flows one at a time. Future work may require a commit
>  operation to tell the hardware we are done loading flow rules. On
>  some hardware this will help bulk updates.
>
>  Its possible for hardware to return an error from a flow set
>  operation. This can occur for many reasons both transient and
>  resource constraints. We have different error handling strategies
>  built in and listed here,
>
>    *_ERROR_ABORT      abort on first error with errmsg
>
>    *_ERROR_CONTINUE   continue programming flows no errmsg
>
>    *_ERROR_ABORT_LOG  abort on first error and return flow that
> 		       failed to user space in reply msg
>
>    *_ERROR_CONT_LOG   continue programming flows and return a list
>		       of flows that failed to user space in a reply
>		       msg.
>
>  notably missing is a rollback error strategy. I don't have a
>  use for this in software yet but the strategy can be added with
>  *_ERROR_ROLLBACK for example.
>
>table_del_flows
>
>  The delete flow operation uses the same structures and error
>  handling strategies as the table_set_flows operations. Although on
>  delete messges ommit the matches/actions arrays because they are
>  not needed to lookup the flow.
>
>Also thanks to Simon Horman for fixes and other help.
>
>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>---
> include/linux/if_flow.h      |   21 ++
> include/linux/netdevice.h    |    8 +
> include/uapi/linux/if_flow.h |   49 ++++
> net/core/flow_table.c        |  501 ++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 579 insertions(+)
>
>diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
>index 1b6c1ea..20fa752 100644
>--- a/include/linux/if_flow.h
>+++ b/include/linux/if_flow.h
>@@ -90,4 +90,25 @@ struct net_flow_tbl_node {
> 	__u32 flags;
> 	struct net_flow_jump_table *jump;
> };
>+
>+/**
>+ * @struct net_flow_flow
>+ * @brief describes the match/action entry
>+ *
>+ * @uid unique identifier for flow
>+ * @priority priority to execute flow match/action in table
>+ * @match null terminated set of match uids match criteria
>+ * @actoin null terminated set of action uids to apply to match
>+ *
>+ * Flows must match all entries in match set.
>+ */
>+struct net_flow_flow {
>+	int table_id;
>+	int uid;
>+	int priority;
>+	struct net_flow_field_ref *matches;
>+	struct net_flow_action *actions;
>+};
>+
>+int net_flow_put_flow(struct sk_buff *skb, struct net_flow_flow *flow);
> #endif
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 3c3c856..be8d4e4 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -1197,6 +1197,14 @@ struct net_device_ops {
> 	struct net_flow_header	**(*ndo_flow_get_headers)(struct net_device *dev);
> 	struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct net_device *dev);
> 	struct net_flow_tbl_node **(*ndo_flow_get_tbl_graph)(struct net_device *dev);
>+	int		        (*ndo_flow_get_flows)(struct sk_buff *skb,
>+						      struct net_device *dev,
>+						      int table,
>+						      int min, int max);
>+	int		        (*ndo_flow_set_flows)(struct net_device *dev,
>+						      struct net_flow_flow *f);
>+	int		        (*ndo_flow_del_flows)(struct net_device *dev,
>+						      struct net_flow_flow *f);
> #endif
> };
> 
>diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h
>index 2acdb38..125cdc6 100644
>--- a/include/uapi/linux/if_flow.h
>+++ b/include/uapi/linux/if_flow.h
>@@ -329,6 +329,48 @@ enum {
> #define NET_FLOW_TABLE_GRAPH_MAX (__NET_FLOW_TABLE_GRAPH_MAX - 1)
> 
> enum {
>+	NET_FLOW_NET_FLOW_UNSPEC,
>+	NET_FLOW_FLOW,
>+	__NET_FLOW_NET_FLOW_MAX,
>+};
>+#define NET_FLOW_NET_FLOW_MAX (__NET_FLOW_NET_FLOW_MAX - 1)
>+
>+enum {
>+	NET_FLOW_TABLE_FLOWS_UNSPEC,
>+	NET_FLOW_TABLE_FLOWS_TABLE,
>+	NET_FLOW_TABLE_FLOWS_MINPRIO,
>+	NET_FLOW_TABLE_FLOWS_MAXPRIO,
>+	NET_FLOW_TABLE_FLOWS_FLOWS,
>+	__NET_FLOW_TABLE_FLOWS_MAX,
>+};
>+#define NET_FLOW_TABLE_FLOWS_MAX (__NET_FLOW_TABLE_FLOWS_MAX - 1)
>+
>+enum {
>+	/* Abort with normal errmsg */
>+	NET_FLOW_FLOWS_ERROR_ABORT,
>+	/* Ignore errors and continue without logging */
>+	NET_FLOW_FLOWS_ERROR_CONTINUE,
>+	/* Abort and reply with invalid flow fields */
>+	NET_FLOW_FLOWS_ERROR_ABORT_LOG,
>+	/* Continue and reply with list of invalid flows */
>+	NET_FLOW_FLOWS_ERROR_CONT_LOG,
>+	__NET_FLOWS_FLOWS_ERROR_MAX,
>+};
>+#define NET_FLOWS_FLOWS_ERROR_MAX (__NET_FLOWS_FLOWS_ERROR_MAX - 1)
>+
>+enum {
>+	NET_FLOW_ATTR_UNSPEC,
>+	NET_FLOW_ATTR_ERROR,
>+	NET_FLOW_ATTR_TABLE,
>+	NET_FLOW_ATTR_UID,
>+	NET_FLOW_ATTR_PRIORITY,
>+	NET_FLOW_ATTR_MATCHES,
>+	NET_FLOW_ATTR_ACTIONS,
>+	__NET_FLOW_ATTR_MAX,
>+};
>+#define NET_FLOW_ATTR_MAX (__NET_FLOW_ATTR_MAX - 1)
>+
>+enum {
> 	NET_FLOW_IDENTIFIER_IFINDEX, /* net_device ifindex */
> };
> 
>@@ -343,6 +385,9 @@ enum {
> 	NET_FLOW_HEADER_GRAPH,
> 	NET_FLOW_TABLE_GRAPH,
> 
>+	NET_FLOW_FLOWS,
>+	NET_FLOW_FLOWS_ERROR,
>+
> 	__NET_FLOW_MAX,
> 	NET_FLOW_MAX = (__NET_FLOW_MAX - 1),
> };
>@@ -354,6 +399,10 @@ enum {
> 	NET_FLOW_TABLE_CMD_GET_HDR_GRAPH,
> 	NET_FLOW_TABLE_CMD_GET_TABLE_GRAPH,
> 
>+	NET_FLOW_TABLE_CMD_GET_FLOWS,
>+	NET_FLOW_TABLE_CMD_SET_FLOWS,
>+	NET_FLOW_TABLE_CMD_DEL_FLOWS,
>+
> 	__NET_FLOW_CMD_MAX,
> 	NET_FLOW_CMD_MAX = (__NET_FLOW_CMD_MAX - 1),
> };
>diff --git a/net/core/flow_table.c b/net/core/flow_table.c
>index ec3f06d..f4cf293 100644
>--- a/net/core/flow_table.c
>+++ b/net/core/flow_table.c
>@@ -774,6 +774,489 @@ static int net_flow_cmd_get_table_graph(struct sk_buff *skb,
> 	return genlmsg_reply(msg, info);
> }
> 
>+static struct sk_buff *net_flow_build_flows_msg(struct net_device *dev,
>+						u32 portid, int seq, u8 cmd,
>+						int min, int max, int table)
>+{
>+	struct genlmsghdr *hdr;
>+	struct nlattr *flows;
>+	struct sk_buff *skb;
>+	int err = -ENOBUFS;
>+
>+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>+	if (!skb)
>+		return ERR_PTR(-ENOBUFS);
>+
>+	hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
>+	if (!hdr)
>+		goto out;
>+
>+	if (nla_put_u32(skb,
>+			NET_FLOW_IDENTIFIER_TYPE,
>+			NET_FLOW_IDENTIFIER_IFINDEX) ||
>+	    nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) {
>+		err = -ENOBUFS;
>+		goto out;
>+	}
>+
>+	flows = nla_nest_start(skb, NET_FLOW_FLOWS);
>+	if (!flows) {
>+		err = -EMSGSIZE;
>+		goto out;
>+	}
>+
>+	err = dev->netdev_ops->ndo_flow_get_flows(skb, dev, table, min, max);
>+	if (err < 0)
>+		goto out_cancel;
>+
>+	nla_nest_end(skb, flows);
>+
>+	err = genlmsg_end(skb, hdr);
>+	if (err < 0)
>+		goto out;
>+
>+	return skb;
>+out_cancel:
>+	nla_nest_cancel(skb, flows);
>+out:
>+	nlmsg_free(skb);
>+	return ERR_PTR(err);
>+}
>+
>+static const
>+struct nla_policy net_flow_table_flows_policy[NET_FLOW_TABLE_FLOWS_MAX + 1] = {
>+	[NET_FLOW_TABLE_FLOWS_TABLE]   = { .type = NLA_U32,},
>+	[NET_FLOW_TABLE_FLOWS_MINPRIO] = { .type = NLA_U32,},
>+	[NET_FLOW_TABLE_FLOWS_MAXPRIO] = { .type = NLA_U32,},
>+	[NET_FLOW_TABLE_FLOWS_FLOWS]   = { .type = NLA_NESTED,},
>+};
>+
>+static int net_flow_table_cmd_get_flows(struct sk_buff *skb,
>+					struct genl_info *info)
>+{
>+	struct nlattr *tb[NET_FLOW_TABLE_FLOWS_MAX+1];
>+	int table, min = -1, max = -1;
>+	struct net_device *dev;
>+	struct sk_buff *msg;
>+	int err = -EINVAL;
>+
>+	dev = net_flow_get_dev(info);
>+	if (!dev)
>+		return -EINVAL;
>+
>+	if (!dev->netdev_ops->ndo_flow_get_flows) {
>+		dev_put(dev);
>+		return -EOPNOTSUPP;
>+	}
>+
>+	if (!info->attrs[NET_FLOW_IDENTIFIER_TYPE] ||
>+	    !info->attrs[NET_FLOW_IDENTIFIER] ||
>+	    !info->attrs[NET_FLOW_FLOWS])
>+		goto out;
>+
>+	err = nla_parse_nested(tb, NET_FLOW_TABLE_FLOWS_MAX,
>+			       info->attrs[NET_FLOW_FLOWS],
>+			       net_flow_table_flows_policy);
>+	if (err)
>+		goto out;
>+
>+	if (!tb[NET_FLOW_TABLE_FLOWS_TABLE])
>+		goto out;
>+
>+	table = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_TABLE]);
>+
>+	if (tb[NET_FLOW_TABLE_FLOWS_MINPRIO])
>+		min = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_MINPRIO]);
>+	if (tb[NET_FLOW_TABLE_FLOWS_MAXPRIO])
>+		max = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_MAXPRIO]);
>+
>+	msg = net_flow_build_flows_msg(dev,
>+				       info->snd_portid,
>+				       info->snd_seq,
>+				       NET_FLOW_TABLE_CMD_GET_FLOWS,
>+				       min, max, table);
>+	dev_put(dev);
>+
>+	if (IS_ERR(msg))
>+		return PTR_ERR(msg);
>+
>+	return genlmsg_reply(msg, info);
>+out:
>+	dev_put(dev);
>+	return err;
>+}
>+
>+static struct sk_buff *net_flow_start_errmsg(struct net_device *dev,
>+					     struct genlmsghdr **hdr,
>+					     u32 portid, int seq, u8 cmd)
>+{
>+	struct genlmsghdr *h;
>+	struct sk_buff *skb;
>+
>+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>+	if (!skb)
>+		return ERR_PTR(-EMSGSIZE);
>+
>+	h = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
>+	if (!h)
>+		return ERR_PTR(-EMSGSIZE);
>+
>+	if (nla_put_u32(skb,
>+			NET_FLOW_IDENTIFIER_TYPE,
>+			NET_FLOW_IDENTIFIER_IFINDEX) ||
>+	    nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex))
>+		return ERR_PTR(-EMSGSIZE);
>+
>+	*hdr = h;
>+	return skb;
>+}
>+
>+static struct sk_buff *net_flow_end_flow_errmsg(struct sk_buff *skb,
>+						struct genlmsghdr *hdr)
>+{
>+	int err;
>+
>+	err = genlmsg_end(skb, hdr);
>+	if (err < 0) {
>+		nlmsg_free(skb);
>+		return ERR_PTR(err);
>+	}
>+
>+	return skb;
>+}
>+
>+static int net_flow_put_flow_action(struct sk_buff *skb,
>+				    struct net_flow_action *a)
>+{
>+	struct nlattr *action, *sigs;
>+	int i, err = 0;
>+
>+	action = nla_nest_start(skb, NET_FLOW_ACTION);
>+	if (!action)
>+		return -EMSGSIZE;
>+
>+	if (nla_put_u32(skb, NET_FLOW_ACTION_ATTR_UID, a->uid))
>+		return -EMSGSIZE;
>+
>+	if (!a->args)
>+		goto done;
>+
>+	for (i = 0; a->args[i].type; i++) {
>+		sigs = nla_nest_start(skb, NET_FLOW_ACTION_ATTR_SIGNATURE);
>+		if (!sigs) {
>+			nla_nest_cancel(skb, action);
>+			return -EMSGSIZE;
>+		}
>+
>+		err = net_flow_put_act_types(skb, a[i].args);
>+		if (err) {
>+			nla_nest_cancel(skb, action);
>+			nla_nest_cancel(skb, sigs);
>+			return err;
>+		}
>+		nla_nest_end(skb, sigs);
>+	}
>+
>+done:
>+	nla_nest_end(skb, action);
>+	return 0;
>+}
>+
>+int net_flow_put_flow(struct sk_buff *skb, struct net_flow_flow *flow)
>+{
>+	struct nlattr *flows, *matches;
>+	struct nlattr *actions = NULL; /* must be null to unwind */
>+	int err, j, i = 0;
>+
>+	flows = nla_nest_start(skb, NET_FLOW_FLOW);
>+	if (!flows)
>+		goto put_failure;
>+
>+	if (nla_put_u32(skb, NET_FLOW_ATTR_TABLE, flow->table_id) ||
>+	    nla_put_u32(skb, NET_FLOW_ATTR_UID, flow->uid) ||
>+	    nla_put_u32(skb, NET_FLOW_ATTR_PRIORITY, flow->priority))
>+		goto flows_put_failure;
>+
>+	if (flow->matches) {
>+		matches = nla_nest_start(skb, NET_FLOW_ATTR_MATCHES);
>+		if (!matches)
>+			goto flows_put_failure;
>+
>+		for (j = 0; flow->matches && flow->matches[j].header; j++) {
>+			struct net_flow_field_ref *f = &flow->matches[j];
>+
>+			if (!f->header)
>+				continue;
>+
>+			nla_put(skb, NET_FLOW_FIELD_REF, sizeof(*f), f);
>+		}
>+		nla_nest_end(skb, matches);
>+	}
>+
>+	if (flow->actions) {
>+		actions = nla_nest_start(skb, NET_FLOW_ATTR_ACTIONS);
>+		if (!actions)
>+			goto flows_put_failure;
>+
>+		for (i = 0; flow->actions && flow->actions[i].uid; i++) {
>+			err = net_flow_put_flow_action(skb, &flow->actions[i]);
>+			if (err) {
>+				nla_nest_cancel(skb, actions);
>+				goto flows_put_failure;
>+			}
>+		}
>+		nla_nest_end(skb, actions);
>+	}
>+
>+	nla_nest_end(skb, flows);
>+	return 0;
>+
>+flows_put_failure:
>+	nla_nest_cancel(skb, flows);
>+put_failure:
>+	return -EMSGSIZE;
>+}
>+EXPORT_SYMBOL(net_flow_put_flow);
>+
>+static int net_flow_get_field(struct net_flow_field_ref *field,
>+			      struct nlattr *nla)
>+{
>+	if (nla_type(nla) != NET_FLOW_FIELD_REF)
>+		return -EINVAL;
>+
>+	if (nla_len(nla) < sizeof(*field))
>+		return -EINVAL;
>+
>+	*field = *(struct net_flow_field_ref *)nla_data(nla);
>+	return 0;
>+}
>+
>+static int net_flow_get_action(struct net_flow_action *a, struct nlattr *attr)
>+{
>+	struct nlattr *act[NET_FLOW_ACTION_ATTR_MAX+1];
>+	struct nlattr *args;
>+	int rem;
>+	int err, count = 0;
>+
>+	if (nla_type(attr) != NET_FLOW_ACTION) {
>+		pr_warn("%s: expected NET_FLOW_ACTION\n", __func__);
>+		return 0;
>+	}
>+
>+	err = nla_parse_nested(act, NET_FLOW_ACTION_ATTR_MAX,
>+			       attr, net_flow_action_policy);
>+	if (err < 0)
>+		return err;
>+
>+	if (!act[NET_FLOW_ACTION_ATTR_UID] ||
>+	    !act[NET_FLOW_ACTION_ATTR_SIGNATURE])
>+		return -EINVAL;
>+
>+	a->uid = nla_get_u32(act[NET_FLOW_ACTION_ATTR_UID]);
>+
>+	nla_for_each_nested(args, act[NET_FLOW_ACTION_ATTR_SIGNATURE], rem)
>+		count++; /* unoptimized max possible */
>+
>+	a->args = kcalloc(count + 1,
>+			  sizeof(struct net_flow_action_arg),
>+			  GFP_KERNEL);
>+	count = 0;
>+
>+	nla_for_each_nested(args, act[NET_FLOW_ACTION_ATTR_SIGNATURE], rem) {
>+		if (nla_type(args) != NET_FLOW_ACTION_ARG)
>+			continue;
>+
>+		if (nla_len(args) < sizeof(struct net_flow_action_arg)) {
>+			kfree(a->args);
>+			return -EINVAL;
>+		}
>+
>+		a->args[count] = *(struct net_flow_action_arg *)nla_data(args);
>+	}
>+	return 0;
>+}
>+
>+static const
>+struct nla_policy net_flow_flow_policy[NET_FLOW_ATTR_MAX + 1] = {
>+	[NET_FLOW_ATTR_TABLE]		= { .type = NLA_U32 },
>+	[NET_FLOW_ATTR_UID]		= { .type = NLA_U32 },
>+	[NET_FLOW_ATTR_PRIORITY]	= { .type = NLA_U32 },
>+	[NET_FLOW_ATTR_MATCHES]		= { .type = NLA_NESTED },
>+	[NET_FLOW_ATTR_ACTIONS]		= { .type = NLA_NESTED },
>+};
>+
>+static int net_flow_get_flow(struct net_flow_flow *flow, struct nlattr *attr)
>+{
>+	struct nlattr *f[NET_FLOW_ATTR_MAX+1];
>+	struct nlattr *attr2;
>+	int rem, err;
>+	int count = 0;
>+
>+	err = nla_parse_nested(f, NET_FLOW_ATTR_MAX,
>+			       attr, net_flow_flow_policy);
>+	if (err < 0)
>+		return -EINVAL;
>+
>+	if (!f[NET_FLOW_ATTR_TABLE] || !f[NET_FLOW_ATTR_UID] ||
>+	    !f[NET_FLOW_ATTR_PRIORITY])
>+		return -EINVAL;
>+
>+	flow->table_id = nla_get_u32(f[NET_FLOW_ATTR_TABLE]);
>+	flow->uid = nla_get_u32(f[NET_FLOW_ATTR_UID]);
>+	flow->priority = nla_get_u32(f[NET_FLOW_ATTR_PRIORITY]);
>+
>+	flow->matches = NULL;
>+	flow->actions = NULL;
>+
>+	if (f[NET_FLOW_ATTR_MATCHES]) {
>+		nla_for_each_nested(attr2, f[NET_FLOW_ATTR_MATCHES], rem)
>+			count++;
>+
>+		/* Null terminated list of matches */
>+		flow->matches = kcalloc(count + 1,
>+					sizeof(struct net_flow_field_ref),
>+					GFP_KERNEL);
>+		if (!flow->matches)
>+			return -ENOMEM;
>+
>+		count = 0;
>+		nla_for_each_nested(attr2, f[NET_FLOW_ATTR_MATCHES], rem) {
>+			err = net_flow_get_field(&flow->matches[count], attr2);
>+			if (err) {
>+				kfree(flow->matches);
>+				return err;
>+			}
>+			count++;
>+		}
>+	}
>+
>+	if (f[NET_FLOW_ATTR_ACTIONS]) {
>+		count = 0;
>+		nla_for_each_nested(attr2, f[NET_FLOW_ATTR_ACTIONS], rem)
>+			count++;
>+
>+		/* Null terminated list of actions */
>+		flow->actions = kcalloc(count + 1,
>+					sizeof(struct net_flow_action),
>+					GFP_KERNEL);
>+		if (!flow->actions) {
>+			kfree(flow->matches);
>+			return -ENOMEM;
>+		}
>+
>+		count = 0;
>+		nla_for_each_nested(attr2, f[NET_FLOW_ATTR_ACTIONS], rem) {
>+			err = net_flow_get_action(&flow->actions[count], attr2);
>+			if (err) {
>+				kfree(flow->matches);
>+				kfree(flow->actions);
>+				return err;
>+			}
>+			count++;
>+		}
>+	}
>+
>+	return 0;
>+}
>+
>+static int net_flow_table_cmd_flows(struct sk_buff *recv_skb,
>+				    struct genl_info *info)
>+{
>+	int rem, err_handle = NET_FLOW_FLOWS_ERROR_ABORT;
>+	struct sk_buff *skb = NULL;
>+	struct net_flow_flow this;
>+	struct genlmsghdr *hdr;
>+	struct net_device *dev;
>+	struct nlattr *flow, *flows;
>+	int cmd = info->genlhdr->cmd;
>+	int err = -EOPNOTSUPP;

I don't like the inconsistency in var naming. Sometimes, "flow" is of type
struct nlattr, sometimes it is of type struct net_flow_flow
(net_flow_get_flow). It is slightly confusing.

>+
>+	dev = net_flow_get_dev(info);
>+	if (!dev)
>+		return -EINVAL;
>+
>+	if (!dev->netdev_ops->ndo_flow_set_flows ||
>+	    !dev->netdev_ops->ndo_flow_del_flows)
>+		goto out;
>+
>+	if (!info->attrs[NET_FLOW_IDENTIFIER_TYPE] ||
>+	    !info->attrs[NET_FLOW_IDENTIFIER] ||
>+	    !info->attrs[NET_FLOW_FLOWS]) {
>+		err = -EINVAL;
>+		goto out;
>+	}
>+
>+	if (info->attrs[NET_FLOW_FLOWS_ERROR])
>+		err_handle = nla_get_u32(info->attrs[NET_FLOW_FLOWS_ERROR]);
>+
>+	nla_for_each_nested(flow, info->attrs[NET_FLOW_FLOWS], rem) {
>+		if (nla_type(flow) != NET_FLOW_FLOW)
>+			continue;
>+
>+		err = net_flow_get_flow(&this, flow);
>+		if (err)
>+			goto out;
>+
>+		switch (cmd) {
>+		case NET_FLOW_TABLE_CMD_SET_FLOWS:
>+			err = dev->netdev_ops->ndo_flow_set_flows(dev, &this);
>+			break;
>+		case NET_FLOW_TABLE_CMD_DEL_FLOWS:
>+			err = dev->netdev_ops->ndo_flow_del_flows(dev, &this);
>+			break;
>+		default:
>+			err = -EOPNOTSUPP;
>+			break;
>+		}
>+
>+		if (err && err_handle != NET_FLOW_FLOWS_ERROR_CONTINUE) {
>+			if (!skb) {
>+				skb = net_flow_start_errmsg(dev, &hdr,
>+							    info->snd_portid,
>+							    info->snd_seq,
>+							    cmd);
>+				if (IS_ERR(skb)) {
>+					err = PTR_ERR(skb);
>+					goto out_plus_free;
>+				}
>+
>+				flows = nla_nest_start(skb, NET_FLOW_FLOWS);
>+				if (!flows) {
>+					err = -EMSGSIZE;
>+					goto out_plus_free;
>+				}
>+			}
>+
>+			net_flow_put_flow(skb, &this);
>+		}
>+
>+		/* Cleanup flow */
>+		kfree(this.matches);
>+		kfree(this.actions);
>+
>+		if (err && err_handle == NET_FLOW_FLOWS_ERROR_ABORT)
>+			goto out;
>+	}
>+
>+	dev_put(dev);
>+
>+	if (skb) {
>+		nla_nest_end(skb, flows);
>+		net_flow_end_flow_errmsg(skb, hdr);
>+		return genlmsg_reply(skb, info);
>+	}
>+	return 0;
>+
>+out_plus_free:
>+	kfree(this.matches);
>+	kfree(this.actions);

	Maybe this can be done by some "flow_free" helper...

>+out:
>+	if (skb)
>+		nlmsg_free(skb);
>+	dev_put(dev);
>+	return -EINVAL;
>+}
>+
> static const struct nla_policy net_flow_cmd_policy[NET_FLOW_MAX + 1] = {
> 	[NET_FLOW_IDENTIFIER_TYPE] = {.type = NLA_U32, },
> 	[NET_FLOW_IDENTIFIER]	   = {.type = NLA_U32, },
>@@ -815,6 +1298,24 @@ static const struct genl_ops net_flow_table_nl_ops[] = {
> 		.policy = net_flow_cmd_policy,
> 		.flags = GENL_ADMIN_PERM,
> 	},
>+	{
>+		.cmd = NET_FLOW_TABLE_CMD_GET_FLOWS,
>+		.doit = net_flow_table_cmd_get_flows,
>+		.policy = net_flow_cmd_policy,
>+		.flags = GENL_ADMIN_PERM,
>+	},
>+	{
>+		.cmd = NET_FLOW_TABLE_CMD_SET_FLOWS,
>+		.doit = net_flow_table_cmd_flows,
>+		.policy = net_flow_cmd_policy,
>+		.flags = GENL_ADMIN_PERM,
>+	},
>+	{
>+		.cmd = NET_FLOW_TABLE_CMD_DEL_FLOWS,
>+		.doit = net_flow_table_cmd_flows,
>+		.policy = net_flow_cmd_policy,
>+		.flags = GENL_ADMIN_PERM,
>+	},
> };
> 
> static int __init net_flow_nl_module_init(void)
>

^ permalink raw reply

* Re: [net-next PATCH v1 03/11] net: flow_table: add apply action argument to tables
From: Jiri Pirko @ 2015-01-08 17:41 UTC (permalink / raw)
  To: John Fastabend; +Cc: tgraf, sfeldma, jhs, simon.horman, netdev, davem, andy
In-Reply-To: <20141231194642.31070.14445.stgit@nitbit.x32>

Wed, Dec 31, 2014 at 08:46:44PM CET, john.fastabend@gmail.com wrote:
>Actions may not always be applied after exiting a table. For example
>some pipelines may accumulate actions and then apply them at the end
>of a pipeline.
>
>To model this we use a table type called APPLY. Tables who share an
>apply identifier have their actions applied in one step.

Why this is a separate patch? Perhaps this can be squashed to one of the
previous ones?

>
>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>---
> include/linux/if_flow.h      |    1 +
> include/uapi/linux/if_flow.h |    1 +
> net/core/flow_table.c        |    1 +
> 3 files changed, 3 insertions(+)
>
>diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
>index 20fa752..a042a3d 100644
>--- a/include/linux/if_flow.h
>+++ b/include/linux/if_flow.h
>@@ -67,6 +67,7 @@ struct net_flow_table {
> 	char name[NET_FLOW_NAMSIZ];
> 	int uid;
> 	int source;
>+	int apply_action;
> 	int size;
> 	struct net_flow_field_ref *matches;
> 	int *actions;
>diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h
>index 125cdc6..3c1a860 100644
>--- a/include/uapi/linux/if_flow.h
>+++ b/include/uapi/linux/if_flow.h
>@@ -265,6 +265,7 @@ enum {
> 	NET_FLOW_TABLE_ATTR_NAME,
> 	NET_FLOW_TABLE_ATTR_UID,
> 	NET_FLOW_TABLE_ATTR_SOURCE,
>+	NET_FLOW_TABLE_ATTR_APPLY,
> 	NET_FLOW_TABLE_ATTR_SIZE,
> 	NET_FLOW_TABLE_ATTR_MATCHES,
> 	NET_FLOW_TABLE_ATTR_ACTIONS,
>diff --git a/net/core/flow_table.c b/net/core/flow_table.c
>index f4cf293..97cdf92 100644
>--- a/net/core/flow_table.c
>+++ b/net/core/flow_table.c
>@@ -223,6 +223,7 @@ static int net_flow_put_table(struct net_device *dev,
> 	if (nla_put_string(skb, NET_FLOW_TABLE_ATTR_NAME, t->name) ||
> 	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_UID, t->uid) ||
> 	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SOURCE, t->source) ||
>+	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_APPLY, t->apply_action) ||
> 	    nla_put_u32(skb, NET_FLOW_TABLE_ATTR_SIZE, t->size))
> 		return -EMSGSIZE;
> 
>

^ permalink raw reply

* [PATCH iproute2] ss: Filter inet dgram sockets with established state by default
From: Vadim Kochan @ 2015-01-08 17:32 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan

From: Vadim Kochan <vadim4j@gmail.com>

As inet dgram sockets (udp, raw) can call connect(...)  - they
might be set in ESTABLISHED state. So keep the original behaviour of
'ss' which filtered them by ESTABLISHED state by default. So:

    $ ss -u

    or

    $ ss -w

Will show only ESTABLISHED UDP sockets by default.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
 misc/ss.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 08d210a..015d829 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -170,11 +170,11 @@ static const struct filter default_dbs[MAX_DB] = {
 		.families = (1 << AF_INET) | (1 << AF_INET6),
 	},
 	[UDP_DB] = {
-		.states   = (1 << SS_CLOSE),
+		.states   = (1 << SS_ESTABLISHED),
 		.families = (1 << AF_INET) | (1 << AF_INET6),
 	},
 	[RAW_DB] = {
-		.states   = (1 << SS_CLOSE),
+		.states   = (1 << SS_ESTABLISHED),
 		.families = (1 << AF_INET) | (1 << AF_INET6),
 	},
 	[UNIX_DG_DB] = {
-- 
2.1.3

^ permalink raw reply related

* Re: [RFC net-next 0/3] devconf: New infrastructure for setting pre-load parameters for a module
From: Greg KH @ 2015-01-08 17:47 UTC (permalink / raw)
  To: Amir Vadai
  Cc: Hadar Hen Zion, David S. Miller, netdev@vger.kernel.org,
	Yevgeny Petrilin, Or Gerlitz, shannon.nelson@intel.com,
	Doug Ledford, greearb@candelatech.com
In-Reply-To: <54AEBA28.1030301@mellanox.com>

On Thu, Jan 08, 2015 at 07:11:04PM +0200, Amir Vadai wrote:
> On 1/8/2015 6:46 PM, Greg KH wrote:
> > On Thu, Jan 08, 2015 at 05:16:57PM +0200, Hadar Hen Zion wrote:
> >> Hi,
> >>     
> >> When configuring a device at an early boot stage, most kernel drivers
> >> use module parameters (the parameters' settings can be determined in
> >> modprobe.d config files).
> > 
> > Which is a bad idea, as you have learned :)
> > 
> >> These parameters are difficult to manage, and one of the reasons is that
> >> module parameters are set per driver and not per device (NICs using the
> >> same driver cannot be set with different configurations).
> >> Furthermore, using other existing configuration tools like ethtool,
> >> ifconfig, ip link commands or sysfs entries is not applicable, since
> >> they all rely on having a netdev already set up.
> >>
> >> In the past, 'request_firmware' solution for configuration parameters
> >> was suggested by Shannon Nelson from Intel[1]. The idea was rejected by
> >> Greg KH, who claimed it was abusive of the request_firmware mechanism.
> >> Greg suggested using configfs for device configuration instead (as done
> >> by the USB gadget driver).
> >>
> >> As a solution, we introduce a new kernel infrastructure using configfs
> >> to allow the configuration of the device. The goal is to set low-level
> >> device functionality that needs to be sorted out before a module is
> >> loaded.
> > 
> > Ick, really?  drivers should never need to be configured like this, if
> > so, then something is wrong, they should "just work" by default.  What
> > are you needing to "configure" that can't be determined by something
> > like a device tree?
> Ick indeed - but we can't find anything better.
> 
> For example, we have devices that can act as either netdev or as an
> Infiniband device.
> The driver consists of a core to handle the hardware, and higher layer
> drivers - one for Ethernet and one for Infiniband.
> Today the selection is done through a module parameter. according to it
> the relevant higher level driver is loaded, and the device is
> initialized. You don't want to have a default of netdev, and in every
> installation that needs Infiniband, a netdev will be created, removed
> and only then the Infiniband device will appear.

But that really isn't an issue, right?  Who cares if it happens that
way?

Or just don't do anything until that "higher level" driver is loaded,
either ib or network.

> This is only one example to configuration that needs to be known before
> the hardware is initialized, and be persistent across boots.
> 
> We can have a 2 stages loading. First load the core, wait for user
> input, and only then configure the device and load the right upper layer
> driver according to the user input (configfs/sysfs).

Why not just wait for an "upper layer" driver to be loaded?  It doesn't
make it dynamic, and forces you to have a module manually be loaded, but
would solve this problem.

Or just don't do anything until configfs is set up for your devices,
after the module is loaded.  This allows drivers to be built into the
kernel if people really want that type of configuration to work
properly.

> Since other vendors seems to need this capability too, we thought it
> would be better to make it generic - and this is this what this RFC is
> about.

What other vendors have such horribly designed hardware / kernel modules
that need this type of thing?  :)

thanks,

greg k-h

^ permalink raw reply

* Re: TCP connection issues against Amazon S3
From: Erik Grinaker @ 2015-01-08 17:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <1420666401.5947.35.camel@edumazet-glaptop2.roam.corp.google.com>

On 07 Jan 2015, at 21:33, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-01-07 at 20:37 +0000, Erik Grinaker wrote:
> I agree. I have contacted Amazon about this, but am not too hopeful
>> for a quick fix; they have been promising SACK-support on their
>> loadbalancers since 2006, for example.
>> 
>> That said, since this change breaks a service as popular as S3, it
>> might be worth reconsidering.
> 
> Which change are you talking about ? Have you done a bisection to
> clearly identify the patch exposing this sender bug ?

FWIW, I've done a bisection, and it’s triggered by this change:

https://github.com/torvalds/linux/commit/4e4f1fc226816905c937f9b29dabe351075dfe0f

> We are not going to stick TCP stack to 20th century and buggy peers or
> middleboxes, sorry.

That’s fair enough.

^ permalink raw reply

* [PATCH net-next] netfilter: nat_snmp_basic: Use generic dump facility
From: Joe Perches @ 2015-01-08 17:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, LKML

Use the generic facility instead of a static hex_dump routine.
Delete the now unused static routine.

Now emits the hex dump at KERN_DEBUG instead of default.

Shrinks the kernel ever so trivially.

$ size net/ipv4/netfilter/nf_nat_snmp_basic.o*
   text	   data	    bss	    dec	    hex	filename
   5948	    352	      8	   6308	   18a4	net/ipv4/netfilter/nf_nat_snmp_basic.o.defconfig.new
   5985	    352	      8	   6345	   18c9	net/ipv4/netfilter/nf_nat_snmp_basic.o.defconfig.old

Signed-off-by: Joe Perches <joe@perches.com>
---
 net/ipv4/netfilter/nf_nat_snmp_basic.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic.c b/net/ipv4/netfilter/nf_nat_snmp_basic.c
index 7c67667..5c8d3b5 100644
--- a/net/ipv4/netfilter/nf_nat_snmp_basic.c
+++ b/net/ipv4/netfilter/nf_nat_snmp_basic.c
@@ -998,18 +998,6 @@ err_id_free:
  *
  *****************************************************************************/
 
-static void hex_dump(const unsigned char *buf, size_t len)
-{
-	size_t i;
-
-	for (i = 0; i < len; i++) {
-		if (i && !(i % 16))
-			printk("\n");
-		printk("%02x ", *(buf + i));
-	}
-	printk("\n");
-}
-
 /*
  * Parse and mangle SNMP message according to mapping.
  * (And this is the fucking 'basic' method).
@@ -1026,7 +1014,7 @@ static int snmp_parse_mangle(unsigned char *msg,
 	struct snmp_object *obj;
 
 	if (debug > 1)
-		hex_dump(msg, len);
+		print_hex_dump_bytes("", DUMP_PREFIX_NONE, msg, len);
 
 	asn1_open(&ctx, msg, len);
 

^ permalink raw reply related

* Re: [net-next PATCH v1 00/11] A flow API
From: Jiri Pirko @ 2015-01-08 18:03 UTC (permalink / raw)
  To: John Fastabend; +Cc: tgraf, sfeldma, jhs, simon.horman, netdev, davem, andy
In-Reply-To: <20141231194057.31070.5244.stgit@nitbit.x32>

Wed, Dec 31, 2014 at 08:45:19PM CET, john.fastabend@gmail.com wrote:
>So... I could continue to mull over this and tweak bits and pieces
>here and there but I decided its best to get a wider group of folks
>looking at it and hopefulyl with any luck using it so here it is.
>
>This set creates a new netlink family and set of messages to configure
>flow tables in hardware. I tried to make the commit messages
>reasonably verbose at least in the flow_table patches.
>
>What we get at the end of this series is a working API to get device
>capabilities and program flows using the rocker switch.
>
>I created a user space tool 'flow' that I use to configure and query
>the devices it is posted here,
>
>	https://github.com/jrfastab/iprotue2-flow-tool
>
>For now it is a stand-alone tool but once the kernel bits get sorted
>out (I'm guessing there will need to be a few versions of this series
>to get it right) I would like to port it into the iproute2 package.
>This way we can keep all of our tooling in one package see 'bridge'
>for example.
>
>As far as testing, I've tested various combinations of tables and
>rules on the rocker switch and it seems to work. I have not tested
>100% of the rocker code paths though. It would be great to get some
>sort of automated framework around the API to do this. I don't
>think should gate the inclusion of the API though.
>
>I could use some help reviewing,
>
>  (a) error paths and netlink validation code paths
>
>  (b) Break down of structures vs netlink attributes. I
>      am trying to balance flexibility given by having
>      netlinnk TLV attributes vs conciseness. So some
>      things are passed as structures.
>
>  (c) are there any devices that have pipelines that we
>      can't represent with this API? It would be good to
>      know about these so we can design it in probably
>      in a future series.
>
>For some examples and maybe a bit more illustrative description I
>posted a quickly typed up set of notes on github io pages. Here we
>can show the description along with images produced by the flow tool
>showing the pipeline. Once we settle a bit more on the API we should
>probably do a clean up of this and other threads happening and commit
>something to the Documentation directory.
>
> http://jrfastab.github.io/jekyll/update/2014/12/21/flow-api.html
>
>Finally I have more patches to add support for creating and destroying
>tables. This allows users to define the pipeline at runtime rather
>than statically as rocker does now. After this set gets some traction
>I'll look at pushing them in a next round. However it likely requires
>adding another "world" to rocker. Another piece that I want to add is
>a description of the actions and metadata. This way user space can
>"learn" what an action is and how metadata interacts with the system.
>This work is under development.
>
>Thanks! Any comments/feedback always welcome.
>
>And also thanks to everyone who helped with this flow API so far. All
>the folks at Dusseldorf LPC, OVS summit Santa Clara, P4 authors for
>some inspiration, the collection of IETF FoRCES documents I mulled
>over, Netfilter workshop where I started to realize fixing ethtool
>was most likely not going to work, etc.
>
>---
>
>John Fastabend (11):
>      net: flow_table: create interface for hw match/action tables
>      net: flow_table: add flow, delete flow
>      net: flow_table: add apply action argument to tables
>      rocker: add pipeline model for rocker switch
>      net: rocker: add set flow rules
>      net: rocker: add group_id slices and drop explicit goto
>      net: rocker: add multicast path to bridging
>      net: rocker: add get flow API operation
>      net: rocker: add cookie to group acls and use flow_id to set cookie
>      net: rocker: have flow api calls set cookie value
>      net: rocker: implement delete flow routine

Truly impressive work John (including the "flow" tool, documentation).
Hat's off.

Currently, all is very userspace oriented and I understand the reason.
I also understand why Jamal is a bit nervous from that fact. I am as well..
Correct me if I'm wrong but this amount of "direct hw access" is
unprecedented. There have been kernel here to cover the hw differencies,
I wonder if there is any way to continue in this direction with flows...

What I would love to see in this initial patchset is "the internal user".
For example tc. The tc code could query the capabilities and decide what
"flows" to put into hw tables.

Jiri

^ permalink raw reply

* Re: TCP connection issues against Amazon S3
From: Eric Dumazet @ 2015-01-08 18:15 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <DCC8B9AB-D1D7-4722-9B65-93AB5A559763@bengler.no>

On Thu, 2015-01-08 at 17:47 +0000, Erik Grinaker wrote:

> FWIW, I've done a bisection, and it’s triggered by this change:
> 
> https://github.com/torvalds/linux/commit/4e4f1fc226816905c937f9b29dabe351075dfe0f


This totally makes sense, thanks for doing the bisection !

> 
> > We are not going to stick TCP stack to 20th century and buggy peers or
> > middleboxes, sorry.
> 
> That’s fair enough.

Strange thing is that sender does not misbehave at the beginning when
receiver window is still small. Only after a while.

It would be nice to know more details about sender OS/version.

Thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox