Netdev List
 help / color / mirror / Atom feed
* [PATCH 0/2] Re-submit RDS patches
From: Mike Marciniszyn @ 2012-12-21 18:01 UTC (permalink / raw)
  To: venkat.x.venkatsubra-QHcLZuEGTsvQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A,
	rds-devel-N0ozoZBvEnrZJqsBc5GL+g, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	netdev-u79uwXL29TY76Z2rM5mHXA

These fixes were originally submitted in
https://oss.oracle.com/pipermail/rds-devel/2012-April/thread.html.

The first patch fixes a DOA issue with RDS on qib and should
be a stable patch as well.

The second suppresses a warning message that is misleading when 
a version has been correctly determined.

These two patches were originally acked by the upstream maintainer
and never merged.

---

Mike Marciniszyn (2):
      IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len
      IB/rds: suppress incompatible protocol when version is known


 net/rds/ib_cm.c   |   11 +++++------
 net/rds/ib_recv.c |    9 ++++++---
 2 files changed, 11 insertions(+), 9 deletions(-)

-- 
Thanks!
Mike Marciniszyn
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 1/2] IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len
From: Mike Marciniszyn @ 2012-12-21 18:01 UTC (permalink / raw)
  To: venkat.x.venkatsubra; +Cc: linux-rdma, roland, rds-devel, davem, netdev
In-Reply-To: <20121221175857.23716.46975.stgit@phlsvslse11.ph.intel.com>

0b088e00 ("RDS: Use page_remainder_alloc() for recv bufs")
added uses of sg_dma_len() and sg_dma_address(). This makes
RDS DOA with the qib driver.

IB ulps should use ib_sg_dma_len() and ib_sg_dma_address
respectively since some HCAs overload ib_sg_dma* operations.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
---
 net/rds/ib_recv.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 8c5bc85..8eb9501 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -339,8 +339,8 @@ static int rds_ib_recv_refill_one(struct rds_connection *conn,
 	sge->length = sizeof(struct rds_header);
 
 	sge = &recv->r_sge[1];
-	sge->addr = sg_dma_address(&recv->r_frag->f_sg);
-	sge->length = sg_dma_len(&recv->r_frag->f_sg);
+	sge->addr = ib_sg_dma_address(ic->i_cm_id->device, &recv->r_frag->f_sg);
+	sge->length = ib_sg_dma_len(ic->i_cm_id->device, &recv->r_frag->f_sg);
 
 	ret = 0;
 out:
@@ -381,7 +381,10 @@ void rds_ib_recv_refill(struct rds_connection *conn, int prefill)
 		ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
 		rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
 			 recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
-			 (long) sg_dma_address(&recv->r_frag->f_sg), ret);
+			 (long) ib_sg_dma_address(
+				ic->i_cm_id->device,
+				&recv->r_frag->f_sg),
+			ret);
 		if (ret) {
 			rds_ib_conn_error(conn, "recv post on "
 			       "%pI4 returned %d, disconnecting and "

^ permalink raw reply related

* [PATCH 2/2] IB/rds: suppress incompatible protocol when version is known
From: Mike Marciniszyn @ 2012-12-21 18:01 UTC (permalink / raw)
  To: venkat.x.venkatsubra; +Cc: linux-rdma, roland, rds-devel, davem, netdev
In-Reply-To: <20121221175857.23716.46975.stgit@phlsvslse11.ph.intel.com>

Add an else to only print the incompatible protocol message
when version hasn't been established.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
---
 net/rds/ib_cm.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index a1e1162..31b74f5 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -434,12 +434,11 @@ static u32 rds_ib_protocol_compatible(struct rdma_cm_event *event)
 		version = RDS_PROTOCOL_3_0;
 		while ((common >>= 1) != 0)
 			version++;
-	}
-	printk_ratelimited(KERN_NOTICE "RDS: Connection from %pI4 using "
-			"incompatible protocol version %u.%u\n",
-			&dp->dp_saddr,
-			dp->dp_protocol_major,
-			dp->dp_protocol_minor);
+	} else
+		printk_ratelimited(KERN_NOTICE "RDS: Connection from %pI4 using incompatible protocol version %u.%u\n",
+				&dp->dp_saddr,
+				dp->dp_protocol_major,
+				dp->dp_protocol_minor);
 	return version;
 }
 

^ permalink raw reply related

* TCP sequence number inference attack on Linux
From: Zhiyun Qian @ 2012-12-21 17:58 UTC (permalink / raw)
  To: netdev

Dear sir or madam,

My name is Zhiyun Qian, a recent PhD graduate from University of
Michigan. As our recent research effort, along with my colleagues, we
identified a vulnerability related to Linux. Details described in our
paper published at this year's ACM Conference on Computer and
Communications Security (CCS): Collaborative TCP Sequence Number
Inference Attack available
http://web.eecs.umich.edu/~zhiyunq/pub/ccs12_TCP_sequence_number_inference.pdf

Keywords: TCP, sequence number inference, DelayedAckLost counter,
privilege-escalation attack

The vulnerability would allow an local malicious program to gain write
access to TCP connections of other applications. An example attack
scenario (on android) would be "an attacker uploads a seemingly benign
app to the google play, when run at the background, it can inject
malicious HTML payload into a webpage open by the browser".

The problem is caused by the common TCP stats counters (the specific
counter I found is DelayedACKLost) maintained by the kernel (but
exposed to user space). By reading and reporting such counters to an
external attacker (colluded), the aforementioned attack can be
accomplished.

It is essentially a side-channel attack (using TCP stats counters to
infer TCP sequence numbers), but it is so real and easy to carry out
that I believe it should be considered a vulnerability. From a
difference perspective, this attack can be considered as a privilege
escalation attack since it allows a local non-privileged program to
gain write access to TCP connections made by other processes.

The lines of code in kernel impacted by the attack is:
net/ipv4/tcp_input.c at line 4166 (as in the latest kernel v3.7.1)

4160 static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
4161 {
4162        struct tcp_sock *tp = tcp_sk(sk);
4163
4164        if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
4165            before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
4166                NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
4167                tcp_enter_quickack_mode(sk);
4168
4169                if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
4170                        u32 end_seq = TCP_SKB_CB(skb)->end_seq;
4171
4172                        if (after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt))
4173                                end_seq = tp->rcv_nxt;
4174                        tcp_dsack_set(sk, TCP_SKB_CB(skb)->seq, end_seq);
4175                }
4176        }
4177
4178        tcp_send_ack(sk);
4179 }


IMHO, an easy fix to the problem is to disallow unprivileged access to
counters such as DelayedAckLost. However, this solution may not be
ideal. A better way is to always enforce both acknowledgement number
and sequence number check on each incoming TCP packet instead of
checking one at a time. Currently, Linux TCP stack first only
validates the SEQ number and then subsequently the ACK number. If the
sequence number is invalid, the delayedAckLost counter will be
incremented (information about sequence number leaked already
regardless of the ACK number). To make attacker's job much harder (we
should require the attacker to guess both the valid sequence number
and ACK number at the same time). For instance, following RFC 5961:
(SND.UNA - MAX.SND.WND) <= SEG.ACK <= SND.NXT, this would require the
attacker to send many times (on the order of 10000) more packets to
conduct the same attack.

Please feel free to ask me any questions regarding this vulnerability.

Best,
-Zhiyun

^ permalink raw reply

* Re: [RFC] IP_MAX_MTU value
From: Rick Jones @ 2012-12-21 18:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1356072468.21834.4805.camel@edumazet-glaptop>

On 12/20/2012 10:47 PM, Eric Dumazet wrote:
> Hi David
>
> We have the following definition in net/ipv4/route.c
>
> #define IP_MAX_MTU   0xFFF0
>
> This means that "netperf -t UDP_STREAM", using UDP messages of 65507
> bytes, are fragmented on loopback interface (while its MTU is now 65536
> and should allow those UDP messages being sent without fragments)
>
> I guess Rick chose 65507 bytes in netperf because it was related to the
> max IPv4 datagram length :
>
> 65507 + 28 = 65535

That is correct.  From src/nettest_opmni.c:

/* choosing the default send size is a trifle more complicated than it
    used to be as we have to account for different protocol limits */

#define UDP_LENGTH_MAX (0xFFFF - 28)

static int
choose_send_size(int lss, int protocol) {

   int send_size;

   if (lss > 0) {
     send_size = lss_size;

     /* we will assume that everyone has IPPROTO_UDP and thus avoid an
        issue with Windows using an enum */
     if ((protocol == IPPROTO_UDP) && (send_size > UDP_LENGTH_MAX))
       send_size = UDP_LENGTH_MAX;

   }
   else {
     send_size = 4096;
   }
   return send_size;
}

And I figured that while IPv6 allows even larger sizes, the likelihood 
of it mattering in the then near/medium term was minimal.

> Changing IP_MAX_MTU from 0xFFF0 to 0x10000 seems safe [1], but I might
> miss something really obvious ?

If you go beyond the protocol limit of an IPv4 datagram, won't it be 
necessary to  start being a bit more conditional on IPv4 vs IPv6?

> It might be because in old days we reserved 16 bytes for the ethernet
> header, and we wanted to avoid kmalloc() round-up to kmalloc-131072
> slab ?
>
> If so, we certainly can limit skb->head to 32 or 64 KB and complete with
> page fragments the remaining space.
>
> Thanks
>
> [1] performance increase is ~50%

99 times out of 10 I will assert that faster is better, but do we need 
another 50% for UDP over loopback with that large a message size?

happy benchmarking,

rick jones

^ permalink raw reply

* Re: TCP sequence number inference attack on Linux
From: Eric Dumazet @ 2012-12-21 18:31 UTC (permalink / raw)
  To: Zhiyun Qian; +Cc: netdev
In-Reply-To: <CALvgte86miv450KnOcFRR-oEm_f=qRXarDfQkyU7T3OLqq816A@mail.gmail.com>

On Fri, 2012-12-21 at 12:58 -0500, Zhiyun Qian wrote:
> Dear sir or madam,
> 
> My name is Zhiyun Qian, a recent PhD graduate from University of
> Michigan. As our recent research effort, along with my colleagues, we
> identified a vulnerability related to Linux. Details described in our
> paper published at this year's ACM Conference on Computer and
> Communications Security (CCS): Collaborative TCP Sequence Number
> Inference Attack available
> http://web.eecs.umich.edu/~zhiyunq/pub/ccs12_TCP_sequence_number_inference.pdf
> 
> Keywords: TCP, sequence number inference, DelayedAckLost counter,
> privilege-escalation attack
> 
> The vulnerability would allow an local malicious program to gain write
> access to TCP connections of other applications. An example attack
> scenario (on android) would be "an attacker uploads a seemingly benign
> app to the google play, when run at the background, it can inject
> malicious HTML payload into a webpage open by the browser".
> 
> The problem is caused by the common TCP stats counters (the specific
> counter I found is DelayedACKLost) maintained by the kernel (but
> exposed to user space). By reading and reporting such counters to an
> external attacker (colluded), the aforementioned attack can be
> accomplished.
> 
> It is essentially a side-channel attack (using TCP stats counters to
> infer TCP sequence numbers), but it is so real and easy to carry out
> that I believe it should be considered a vulnerability. From a
> difference perspective, this attack can be considered as a privilege
> escalation attack since it allows a local non-privileged program to
> gain write access to TCP connections made by other processes.
> 
> The lines of code in kernel impacted by the attack is:
> net/ipv4/tcp_input.c at line 4166 (as in the latest kernel v3.7.1)
> 
> 4160 static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
> 4161 {
> 4162        struct tcp_sock *tp = tcp_sk(sk);
> 4163
> 4164        if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
> 4165            before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
> 4166                NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
> 4167                tcp_enter_quickack_mode(sk);
> 4168
> 4169                if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
> 4170                        u32 end_seq = TCP_SKB_CB(skb)->end_seq;
> 4171
> 4172                        if (after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt))
> 4173                                end_seq = tp->rcv_nxt;
> 4174                        tcp_dsack_set(sk, TCP_SKB_CB(skb)->seq, end_seq);
> 4175                }
> 4176        }
> 4177
> 4178        tcp_send_ack(sk);
> 4179 }
> 
> 
> IMHO, an easy fix to the problem is to disallow unprivileged access to
> counters such as DelayedAckLost. However, this solution may not be
> ideal. A better way is to always enforce both acknowledgement number
> and sequence number check on each incoming TCP packet instead of
> checking one at a time. Currently, Linux TCP stack first only
> validates the SEQ number and then subsequently the ACK number. If the
> sequence number is invalid, the delayedAckLost counter will be
> incremented (information about sequence number leaked already
> regardless of the ACK number). To make attacker's job much harder (we
> should require the attacker to guess both the valid sequence number
> and ACK number at the same time). For instance, following RFC 5961:
> (SND.UNA - MAX.SND.WND) <= SEG.ACK <= SND.NXT, this would require the
> attacker to send many times (on the order of 10000) more packets to
> conduct the same attack.
> 
> Please feel free to ask me any questions regarding this vulnerability.


I believe RFC 5961 was implemented in recent linux versions.

Is the described vulnerability still present ?

^ permalink raw reply

* Re: [PATCH] xen/netfront: improve truesize tracking
From: Rick Jones @ 2012-12-21 18:33 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: Eric Dumazet, Ian Campbell, netdev@vger.kernel.org,
	Konrad Rzeszutek Wilk, annie li, xen-devel@lists.xensource.com
In-Reply-To: <1609010645.20121221122100@eikelenboom.it>

I'm guessing that trusize checks matter more on the "inbound" path than 
the outbound path?  If that is indeed the case, then instead of, or in 
addition to using the -s option to set the local (netperf side) socket 
buffer size, you should use a -S option to set the remote (netserver 
side) socket buffer size.

happy benchmarking,

rick jones

^ permalink raw reply

* Re: [RFC] IP_MAX_MTU value
From: Eric Dumazet @ 2012-12-21 18:34 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, netdev
In-Reply-To: <50D4A84D.1010402@hp.com>

On Fri, 2012-12-21 at 10:19 -0800, Rick Jones wrote:

> If you go beyond the protocol limit of an IPv4 datagram, won't it be 
> necessary to  start being a bit more conditional on IPv4 vs IPv6?
> 

This IP_MAX_MTU is really an IPv4 thing (static to net/ipv4/route.c)


> 
> 99 times out of 10 I will assert that faster is better, but do we need 
> another 50% for UDP over loopback with that large a message size?

Well, I only didnt understand why sending 65507 UDP messages had to use
fragments. I didnt care of performance at this point, only tried to have
an reasonable explanation.

It turns out its a strange limitation.

^ permalink raw reply

* Re: IPv6 over Firewire
From: Stephan Gatzka @ 2012-12-21 18:39 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki; +Cc: netdev, linux1394-devel
In-Reply-To: <50D4A219.7080807@linux-ipv6.org>


> If you are talking about how to build NS/NA/RS/Redirect messages, you
> can just use ndisc_opt_addr_space() and ndisc_fill_addr_option() here.

Thanks, these functions are certainly helpful. But 
ndisc_opt_addr_space() calculates the required space from dev->addr_len 
and ndisc_addr_option_pad(dev->type). The latter is 0 for IEEE1394 
(firewire). So the required option space just comes from dev->addr_len, 
which is 8 for firewire, resulting in an option address space of 16 (2 
octets).

But rfc3146 requires an option address space of 3 octets. So my main 
question is if in such a situation the best is to reserve additional skb 
tail room using needed_tailroom in struct netdevice. This directly 
affects the memory allocated in ndisc_build_skb().

Regards,

Stephan

^ permalink raw reply

* Re: [RFC] IP_MAX_MTU value
From: Rick Jones @ 2012-12-21 18:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1356114879.21834.7709.camel@edumazet-glaptop>

On 12/21/2012 10:34 AM, Eric Dumazet wrote:
> On Fri, 2012-12-21 at 10:19 -0800, Rick Jones wrote:
>
>> If you go beyond the protocol limit of an IPv4 datagram, won't it be
>> necessary to  start being a bit more conditional on IPv4 vs IPv6?
>>
>
> This IP_MAX_MTU is really an IPv4 thing (static to net/ipv4/route.c)

OK. Doesn't this:

         if (mtu > IP_MAX_MTU)
                 mtu = IP_MAX_MTU;

mean it should be OK to go to 0xFFFF but not 0x10000?  Since 65535 is 
the limit of an IPv4 datagram and so I would think would be the maximum 
MTU for an IPv4 interface.

rick

^ permalink raw reply

* Re: TCP sequence number inference attack on Linux
From: Eric Dumazet @ 2012-12-21 18:52 UTC (permalink / raw)
  To: Zhiyun Qian; +Cc: netdev
In-Reply-To: <1356114663.21834.7697.camel@edumazet-glaptop>

On Fri, 2012-12-21 at 10:31 -0800, Eric Dumazet wrote:

> I believe RFC 5961 was implemented in recent linux versions.
> 
> Is the described vulnerability still present ?
> 

By the way, I believe Chrome browser uses private network namespaces,
and statistics are per network namespace, so it should be safe.

^ permalink raw reply

* Re: [RFC] IP_MAX_MTU value
From: Eric Dumazet @ 2012-12-21 18:54 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, netdev
In-Reply-To: <50D4AF72.2020101@hp.com>

On Fri, 2012-12-21 at 10:50 -0800, Rick Jones wrote:
> On 12/21/2012 10:34 AM, Eric Dumazet wrote:
> > On Fri, 2012-12-21 at 10:19 -0800, Rick Jones wrote:
> >
> >> If you go beyond the protocol limit of an IPv4 datagram, won't it be
> >> necessary to  start being a bit more conditional on IPv4 vs IPv6?
> >>
> >
> > This IP_MAX_MTU is really an IPv4 thing (static to net/ipv4/route.c)
> 
> OK. Doesn't this:
> 
>          if (mtu > IP_MAX_MTU)
>                  mtu = IP_MAX_MTU;
> 
> mean it should be OK to go to 0xFFFF but not 0x10000?  Since 65535 is 
> the limit of an IPv4 datagram and so I would think would be the maximum 
> MTU for an IPv4 interface.

Sure, I meant 0xFFFF and it is the value I used in my tests.

65536 is the current MTU of loopback device, and this can be used by
other protocols.

^ permalink raw reply

* Re: TCP sequence number inference attack on Linux
From: Zhiyun Qian @ 2012-12-21 19:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1356114663.21834.7697.camel@edumazet-glaptop>

That's good to know. However, implementing RFC 5961 alone is not
sufficient. Like I said, since DelayedAckLost counter is incremented
purely upon looking at the sequence number, regardless of the ACK
number. An attacker thus can still infer the sequence number based on
DelayedAckLost counter without knowing the right ACK number.

The next question is how can the attacker eventually know the right
ACK number in order to inject real data. It turns out that this is not
hard either. First, based on the current Linux TCP stack, it accepts
incoming packets without ACK flag. Further, if ACK flag is not set,
ACK number will not be checked at all. See code in
net/ipv4/tcp_input.c, function tcp_rcv_established()

5547        if (th->ack && tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
5548                goto discard;

Second, even if ACK number is always checked before accepting the
payload, it is still possible to infer the ACK number much like how
sequence number can be inferred. The details is described in Section
3.4 of my paper, paragraph starting with "Client-side sequence number
inference".

I'm looking at the latest kernel v3.7.1 right now. I believe the
problem do still exist in today's Linux.

-Zhiyun

On Fri, Dec 21, 2012 at 1:31 PM, Eric Dumazet <erdnetdev@gmail.com> wrote:
> On Fri, 2012-12-21 at 12:58 -0500, Zhiyun Qian wrote:
>> Dear sir or madam,
>>
>> My name is Zhiyun Qian, a recent PhD graduate from University of
>> Michigan. As our recent research effort, along with my colleagues, we
>> identified a vulnerability related to Linux. Details described in our
>> paper published at this year's ACM Conference on Computer and
>> Communications Security (CCS): Collaborative TCP Sequence Number
>> Inference Attack available
>> http://web.eecs.umich.edu/~zhiyunq/pub/ccs12_TCP_sequence_number_inference.pdf
>>
>> Keywords: TCP, sequence number inference, DelayedAckLost counter,
>> privilege-escalation attack
>>
>> The vulnerability would allow an local malicious program to gain write
>> access to TCP connections of other applications. An example attack
>> scenario (on android) would be "an attacker uploads a seemingly benign
>> app to the google play, when run at the background, it can inject
>> malicious HTML payload into a webpage open by the browser".
>>
>> The problem is caused by the common TCP stats counters (the specific
>> counter I found is DelayedACKLost) maintained by the kernel (but
>> exposed to user space). By reading and reporting such counters to an
>> external attacker (colluded), the aforementioned attack can be
>> accomplished.
>>
>> It is essentially a side-channel attack (using TCP stats counters to
>> infer TCP sequence numbers), but it is so real and easy to carry out
>> that I believe it should be considered a vulnerability. From a
>> difference perspective, this attack can be considered as a privilege
>> escalation attack since it allows a local non-privileged program to
>> gain write access to TCP connections made by other processes.
>>
>> The lines of code in kernel impacted by the attack is:
>> net/ipv4/tcp_input.c at line 4166 (as in the latest kernel v3.7.1)
>>
>> 4160 static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
>> 4161 {
>> 4162        struct tcp_sock *tp = tcp_sk(sk);
>> 4163
>> 4164        if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
>> 4165            before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
>> 4166                NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
>> 4167                tcp_enter_quickack_mode(sk);
>> 4168
>> 4169                if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
>> 4170                        u32 end_seq = TCP_SKB_CB(skb)->end_seq;
>> 4171
>> 4172                        if (after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt))
>> 4173                                end_seq = tp->rcv_nxt;
>> 4174                        tcp_dsack_set(sk, TCP_SKB_CB(skb)->seq, end_seq);
>> 4175                }
>> 4176        }
>> 4177
>> 4178        tcp_send_ack(sk);
>> 4179 }
>>
>>
>> IMHO, an easy fix to the problem is to disallow unprivileged access to
>> counters such as DelayedAckLost. However, this solution may not be
>> ideal. A better way is to always enforce both acknowledgement number
>> and sequence number check on each incoming TCP packet instead of
>> checking one at a time. Currently, Linux TCP stack first only
>> validates the SEQ number and then subsequently the ACK number. If the
>> sequence number is invalid, the delayedAckLost counter will be
>> incremented (information about sequence number leaked already
>> regardless of the ACK number). To make attacker's job much harder (we
>> should require the attacker to guess both the valid sequence number
>> and ACK number at the same time). For instance, following RFC 5961:
>> (SND.UNA - MAX.SND.WND) <= SEG.ACK <= SND.NXT, this would require the
>> attacker to send many times (on the order of 10000) more packets to
>> conduct the same attack.
>>
>> Please feel free to ask me any questions regarding this vulnerability.
>
>
> I believe RFC 5961 was implemented in recent linux versions.
>
> Is the described vulnerability still present ?
>
>
>

^ permalink raw reply

* RE: [PATCH 1/3] net: stmmac: change GMAC control register for SGMII
From: Byungho An @ 2012-12-21 19:17 UTC (permalink / raw)
  To: 'Giuseppe CAVALLARO'
  Cc: davem, jeffrey.t.kirsher, netdev, kgene.kim, linux-kernel
In-Reply-To: <50BDEDE8.4030105@st.com>

On 12/04/2012 06:35 AM, Giuseppe CAVALLARO wrote:
>On 11/28/2012 11:57 AM, Byungho An wrote:
>> On 11/26/2012 07:31 PM, Giuseppe CABALLARO wrote:
>>> On 11/23/2012 10:04 AM, Byungho An wrote:
>>>>
>>>> This patch changes GMAC control register (TC(Transmit
>>>> Configuration) and PS(Port Selection) bit for SGMII.
>>>> In case of SGMII, TC bit is '1' and PS bit is 0.
>>>
>>> IMO this new support that should be released for net-next and further
>>> effort is actually needed.
>>>
>> OK, I see but if possible, I want to support the new features which is
>> included in this patch from v3.8
>
>ok I agree and I can support you.
>
>>
>>> The availability of the PCS registers is given by looking at the HW
>>> feature register. In fact, these are optional registers.
>>> I don't want to break the compatibility with old chips.
>>>
>> It means that old chip doesn't have this bit or this register? If that,
how
>> about using compatible in DT blob like snps,dwmac-3.70a and then in just
>> this case trying to read this bit and this register.
>
>The driver also works on mac 10/100 Databook 2.0 that has not these
>registers.
>
>>> I do not see why we have to use Kconfig macro to select ANE etc (as
>>> you do in your patches).
>> OK. I agree with you.
>
>we have to use the HW feature reg.
>
>>
>>> The driver could directly manage the phy device by itself if possible
>>> and the stmmac_init_phy should be reworked.
>>>
>> Could you explain more detail? As I understood, after set ANE bit in MAC
>> side then PHY auto-negotiation can be enabled. If I'm wrong let me know.
>> According to your mention, MAC and PHY auto-negotiation can be managed
>in
>> stmmac_init_phy?
>
>Currently the driver uses the Physical Abstraction Layer (PAL) to dialog
>with a PHY. On all the platforms supported (not only ST) we have always
>used it. Personally, I tested several phy devices with different MII
>interfaces (MII/RMII/GMII/RGMII ... ) but not TBI/RTBI/SGMII interfaces.
>
Currently I'm testing on SGMII interface and so far it's working fine.

>>> There are several things that need to be implemented. For example:
>>>
>>> The ISR (e.g. priv->hw->mac->host_irq_status) should be able to manage
>>> these new interrupts.
>> I think that there would be two additional interrupts."PCS
Auto-Negotiation
>> Complete" and "PCS Link Status Changed". These two interrupts are added
>to
>> "stmmac_interrupt". In my opinion, there are no specific processing for
>> these two irqs. What do you think about it?
>
>if the link changes this has to be logged in the driver.
>For example, depending on the link speed on some platforms we need to
>call dedicated call-back to set sysconfig registers or custom clocks.
>
>>> The code has to be able to maintain the user interface.
>>> For example if you want to enable ANE or manage Advertisement caps.
>>>
>> Does it mean that command line or other network command(e.g. ifconfig...)
>or
>> ioctol? Actually I don't understand exact user interface way. Could you
>> recommend the method for user interface?
>
>Using ethtool or mii-tool the user want to know the link status. So
>these kind of information have to be maintained.
>
As I know, user can set speed, duplex mode and auto-negotiation on/off using
ethtool.
If user wants to set auto-negotiation on, I think GMAC's ANE bit should be
ON as well.
For example, in stmmac_ethtool.c file, I try to add GMAC ANE bit enable
setting.
I think stmmac_set_pauseparam function is suitable for it.

	if (phy->autoneg) {
		if (netif_running(netdev)) {
 			ret = phy_start_aneg(phy);
+			value = readl(priv->ioaddr + GMAC_AN_CTRL);
+			value |= 0x1000;
+			writel(value, priv->ioaddr + GMAC_AN_CTRL);
+		}

What's your opinion?

>Take a look at the stmmac_adjust_link that is called by the PAL.
>
And for speed setting, if user changes speed to 1Gb using ethtool, link
status will be changed and then interrupt is occurred.
In the ISR, If interface is SGMII, I want to add SGMRAL setting and PS(port
selection)for guarantee 1Gb speed.
In the dwmac1000_core.c file.

+	if (unlikely(intr_status & pcs_link_irq)) {
+		readl(ioaddr + GMAC_AN_STATUS);
+		status |= core_irq_pcs_link;

+		if (priv->phydev->interface == PHY_INTERFACE_MODE_SGMII) {
+			value = readl(priv->ioaddr);
+			/* GMAC_CONTROL_PS : Port Selection for GMII */
+			value &= ~GMAC_CONTROL_PS;
+			writel(value, priv->ioaddr);
+			value = readl(priv->ioaddr + GMAC_AN_CTRL);
+			value = |= 0x40000;
+			writel(value, priv->ioaddr + GMAC_AN_CTRL);
+		}		
+	}

Then stmmac_interrupt in stmmac_main.c file call stmmac_adjust_link when
status is core_irq_pcs_link.
 
For auto-negotiation complete interrupt, I try to make patch like below

drivers/net/ethernet/stmicro/stmmac/common.h
@@ -184,6 +184,7 @@ enum core_specific_irq_mask {
 	core_irq_tx_path_exit_lpi_mode = 32,
 	core_irq_rx_path_in_lpi_mode = 64,
 	core_irq_rx_path_exit_lpi_mode = 128,
+	core_irq_rx_pcs_an = 256,
 };
 
 /* DMA HW capabilities */

drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -231,6 +231,11 @@ static int dwmac1000_irq_status(void __iomem *ioaddr)
 		readl(ioaddr + GMAC_PMT);
 		status |= core_irq_receive_pmt_irq;
 	}
+	if (unlikely(intr_status & pcs_ane_irq)) {
+		CHIP_DBG(KERN_INFO "GMAC: PCS Auto-negotiation complete\n");
+		readl(ioaddr + GMAC_AN_STATUS);
+		status |= core_irq_pcs_an;
+	}
 	/* MAC trx/rx EEE LPI entry/exit interrupts */
 	if (intr_status & lpiis_irq) {
 		/* Clean LPI interrupt by reading the Reg 12 */

drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -93,6 +93,7 @@ struct stmmac_priv {
 	int eee_enabled;
 	int eee_active;
 	int tx_lpi_timer;
+	bool core_pcs_an;
 };
 
 extern int phyaddr;

drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1658,6 +1658,8 @@ static irqreturn_t stmmac_interrupt(int irq, void
*dev_id)
 				priv->xstats.mmc_rx_csum_offload_irq_n++;
 			if (status & core_irq_receive_pmt_irq)
 				priv->xstats.irq_receive_pmt_irq_n++;
+			if (status & core_irq_pcs_an)
+				priv->core_pcs_an = true;
 
 			/* For LPI we need to save the tx status */
 			if (status & core_irq_tx_path_in_lpi_mode) {

>>>> Signed-off-by: Byungho An <bh74.an@samsung.com>
>>>> ---
>>>
>>> [snip]
>>>
>>>> +	if (priv->phydev->interface == PHY_INTERFACE_MODE_SGMII) {
>>>> +		value = readl(priv->ioaddr);
>>>> +		/* GMAC_CONTROL_TC : transmit config in RGMII/SGMII */
>>>> +		value |= 0x1000000;
>>>> +		/* GMAC_CONTROL_PS : Port Selection for GMII */
>>>> +		value &= ~(0x8000);
>>>> +		writel(value, priv->ioaddr);
>>>> +	}
>>>> +
>>>
>>>
>>> This parts of code have to be moved in
>>> drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>
>> OK.
>>
I moved this code todwmac1000_core.c using hw feature register like below

+#include "dwmac_dma.h"
 
 static void dwmac1000_core_init(void __iomem *ioaddr)
 {
 	u32 value = readl(ioaddr + GMAC_CONTROL);
+	u32 hw_feature = readl(ioaddr + DMA_HW_FEATURE);
 	value |= GMAC_CORE_INIT;
 	writel(value, ioaddr + GMAC_CONTROL);
+	hw_feature &= DMA_HW_FEAT_ACTPHYIF;
+
+	if ((hw_feature >> 28 < 2) && (hw_feature != 0) {
+		/* transmit config in RGMII/SGMII */
+		value |= GMAC_CONTROL_TC;
+	}
+	writel(value, ioaddr + GMAC_CONTROL);

>>> Pls, do not use value |= 0x1000000 but provide the appropriate defines.
>>>
>> OK.
>>
>>>>    	/* Request the IRQ lines */
>>>>    	ret = request_irq(dev->irq, stmmac_interrupt,
>>>>    			 IRQF_SHARED, dev->name, dev);
>>>>
>> Thank you.
>
>you are welcome
>Peppe
>
>> Byungho An.
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
Thank you.
Byungho An.

 --
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org More majordomo
info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: TCP sequence number inference attack on Linux
From: Zhiyun Qian @ 2012-12-21 19:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1356115931.21834.7745.camel@edumazet-glaptop>

That seems like a good idea. I am not sure how it is implemented
though. Is it a new feature of Linux? Would you mind sending some
pointers for this?

Thanks.
-Zhiyun

On Fri, Dec 21, 2012 at 1:52 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2012-12-21 at 10:31 -0800, Eric Dumazet wrote:
>
>> I believe RFC 5961 was implemented in recent linux versions.
>>
>> Is the described vulnerability still present ?
>>
>
> By the way, I believe Chrome browser uses private network namespaces,
> and statistics are per network namespace, so it should be safe.
>
>
>

^ permalink raw reply

* Re: TCP sequence number inference attack on Linux
From: Eric Dumazet @ 2012-12-21 19:27 UTC (permalink / raw)
  To: Zhiyun Qian; +Cc: netdev
In-Reply-To: <CALvgte9x4LWV8rsquKzNoVcXM14xZjvyuTgpHr-ynZXfGtzzpw@mail.gmail.com>

On Fri, 2012-12-21 at 14:10 -0500, Zhiyun Qian wrote:
> That's good to know. However, implementing RFC 5961 alone is not
> sufficient. Like I said, since DelayedAckLost counter is incremented
> purely upon looking at the sequence number, regardless of the ACK
> number. An attacker thus can still infer the sequence number based on
> DelayedAckLost counter without knowing the right ACK number.
> 



> The next question is how can the attacker eventually know the right
> ACK number in order to inject real data. It turns out that this is not
> hard either. First, based on the current Linux TCP stack, it accepts
> incoming packets without ACK flag. 

I dont really think so.

We must discard frame is th->ack is not set. (Step 5, line 6142)



> Further, if ACK flag is not set,
> ACK number will not be checked at all. See code in
> net/ipv4/tcp_input.c, function tcp_rcv_established()
> 
> 5547        if (th->ack && tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
> 5548                goto discard;
> 
> Second, even if ACK number is always checked before accepting the
> payload, it is still possible to infer the ACK number much like how
> sequence number can be inferred. The details is described in Section
> 3.4 of my paper, paragraph starting with "Client-side sequence number
> inference".
> 
> I'm looking at the latest kernel v3.7.1 right now. I believe the
> problem do still exist in today's Linux.
> 

It seems you know pretty well this code, I wonder why you dont send
patches to fix the bugs...

Its not like it has to be buggy forever.

^ permalink raw reply

* Re: TCP sequence number inference attack on Linux
From: Eric Dumazet @ 2012-12-21 19:30 UTC (permalink / raw)
  To: Zhiyun Qian; +Cc: netdev
In-Reply-To: <CALvgte-=v=bRnQ2HPDxyqxBOjD6DXp_BUySW6Hm7h+81M+RYMA@mail.gmail.com>

On Fri, 2012-12-21 at 14:13 -0500, Zhiyun Qian wrote:
> That seems like a good idea. I am not sure how it is implemented
> though. Is it a new feature of Linux? Would you mind sending some
> pointers for this?

I dont have details, I only remember having to fix a memory allocation
error that was hitting Chrome users.


http://comments.gmane.org/gmane.linux.network/249535


Julien said at that time :

<quote>
It happens when users start Chrome. Chrome will create one new network
NS (for the sandbox).

This has been used for a few years now, but we had our first report in
January of this year and we've been getting a few reports very
recently at a rate that is starting to worry me (crbug.com/110756).

Thanks a lot for helping with this!

Julien
</quote>

^ permalink raw reply

* Re: IPv6 over Firewire
From: YOSHIFUJI Hideaki @ 2012-12-21 19:49 UTC (permalink / raw)
  To: stephan.gatzka; +Cc: netdev, linux1394-devel, YOSHIFUJI Hideaki
In-Reply-To: <50D4ACFA.6040901@gmail.com>

Stephan Gatzka wrote:
> 
>> If you are talking about how to build NS/NA/RS/Redirect messages, you
>> can just use ndisc_opt_addr_space() and ndisc_fill_addr_option() here.
> 
> Thanks, these functions are certainly helpful. But ndisc_opt_addr_space() calculates the required space from dev->addr_len and ndisc_addr_option_pad(dev->type). The latter is 0 for IEEE1394 (firewire). So the required option space just comes from dev->addr_len, which is 8 for firewire, resulting in an option address space of 16 (2 octets).
> 
> But rfc3146 requires an option address space of 3 octets. So my main question is if in such a situation the best is to reserve additional skb tail room using needed_tailroom in struct netdevice. This directly affects the memory allocated in ndisc_build_skb().

Something like this:

 static inline int ndisc_opt_addr_space(struct net_device *dev)
 {
-       return NDISC_OPT_SPACE(dev->addr_len + ndisc_addr_option_pad(dev->type));
+       switch (dev->type) {
+       case ARPHRD_IEEE1394:
+               return sizeof(struct ndisc_opt_ieee1394_llinfo);
+       default:
+               return NDISC_OPT_SPACE(dev->addr_len + ndisc_addr_option_pad(dev->type));
+       }
 }

--yoshfuji

^ permalink raw reply

* Re: TCP sequence number inference attack on Linux
From: Zhiyun Qian @ 2012-12-21 19:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1356118052.21834.7793.camel@edumazet-glaptop>

On Fri, Dec 21, 2012 at 2:27 PM, Eric Dumazet <erdnetdev@gmail.com> wrote:
> On Fri, 2012-12-21 at 14:10 -0500, Zhiyun Qian wrote:
>> That's good to know. However, implementing RFC 5961 alone is not
>> sufficient. Like I said, since DelayedAckLost counter is incremented
>> purely upon looking at the sequence number, regardless of the ACK
>> number. An attacker thus can still infer the sequence number based on
>> DelayedAckLost counter without knowing the right ACK number.
>>
>
>
>
>> The next question is how can the attacker eventually know the right
>> ACK number in order to inject real data. It turns out that this is not
>> hard either. First, based on the current Linux TCP stack, it accepts
>> incoming packets without ACK flag.
>
> I dont really think so.
>
> We must discard frame is th->ack is not set. (Step 5, line 6142)
>

If I am not mistaken, line 6142 in kernel v3.7.1 corresponds to
tcp_rcv_state_process(). According to the comments, "This function
implements the receiving procedure of RFC 793 for all states except
ESTABLISHED and TIME_WAIT." Are you referring to a different kernel
version?

>
>
>> Further, if ACK flag is not set,
>> ACK number will not be checked at all. See code in
>> net/ipv4/tcp_input.c, function tcp_rcv_established()
>>
>> 5547        if (th->ack && tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
>> 5548                goto discard;
>>
>> Second, even if ACK number is always checked before accepting the
>> payload, it is still possible to infer the ACK number much like how
>> sequence number can be inferred. The details is described in Section
>> 3.4 of my paper, paragraph starting with "Client-side sequence number
>> inference".
>>
>> I'm looking at the latest kernel v3.7.1 right now. I believe the
>> problem do still exist in today's Linux.
>>
>
> It seems you know pretty well this code, I wonder why you dont send
> patches to fix the bugs...
>
> Its not like it has to be buggy forever.
>

I have never submitted any patch before...I would do it if no one else
wants to :)

>
>

^ permalink raw reply

* Re: [RFC] IP_MAX_MTU value
From: David Miller @ 2012-12-21 19:59 UTC (permalink / raw)
  To: erdnetdev; +Cc: netdev, rick.jones2, kuznet
In-Reply-To: <1356072468.21834.4805.camel@edumazet-glaptop>

From: Eric Dumazet <erdnetdev@gmail.com>
Date: Thu, 20 Dec 2012 22:47:48 -0800

> We have the following definition in net/ipv4/route.c
> 
> #define IP_MAX_MTU   0xFFF0
> 
> This means that "netperf -t UDP_STREAM", using UDP messages of 65507
> bytes, are fragmented on loopback interface (while its MTU is now 65536
> and should allow those UDP messages being sent without fragments)
> 
> I guess Rick chose 65507 bytes in netperf because it was related to the
> max IPv4 datagram length :
> 
> 65507 + 28 = 65535
> 
> Changing IP_MAX_MTU from 0xFFF0 to 0x10000 seems safe [1], but I might
> miss something really obvious ?
> 
> It might be because in old days we reserved 16 bytes for the ethernet
> header, and we wanted to avoid kmalloc() round-up to kmalloc-131072
> slab ?
> 
> If so, we certainly can limit skb->head to 32 or 64 KB and complete with
> page fragments the remaining space.

I don't think it has to do with kmalloc() at all.

Maybe something strange to do with the fact that each frag has to
be an 8-byte multiple, or something like that?

Alexey choose this value back in 1998, maybe he remembers the
reason for the strange value.

Alexey?

^ permalink raw reply

* [PATCH] net: sched: integer overflow fix
From: Stefan Hasko @ 2012-12-21 20:39 UTC (permalink / raw)
  To: Jamal Hadi Salim, David S. Miller, netdev; +Cc: linux-kernel, Stefan Hasko

Fixed integer overflow in function htb_dequeue

Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
---
 net/sched/sch_htb.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index d2922c0..1bd3faa 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -919,7 +919,7 @@ ok:
 	q->now = ktime_to_ns(ktime_get());
 	start_at = jiffies;
 
-	next_event = q->now + 5 * NSEC_PER_SEC;
+	next_event = q->now + (u32)5 * NSEC_PER_SEC;
 
 	for (level = 0; level < TC_HTB_MAXDEPTH; level++) {
 		/* common case optimization - skip event handler quickly */
-- 
1.7.10.4

^ permalink raw reply related

* Re: sock_ioctl sleeping while atomic warning during boot.
From: David Miller @ 2012-12-21 21:15 UTC (permalink / raw)
  To: brian.haley; +Cc: eric.dumazet, davej, netdev
In-Reply-To: <50D3D7F7.2030200@hp.com>

From: Brian Haley <brian.haley@hp.com>
Date: Thu, 20 Dec 2012 22:31:03 -0500

> On 12/20/2012 10:25 PM, Eric Dumazet wrote:
>> OK, thanks for the report.
>> 
>> We need a seqcount, not a seqlock, as RTNL already protects multiple
>> writers.
>> 
>> Please try following fix :
>> 
>> 
>> [PATCH] net: devnet_rename_seq should be a seqcount
>> 
>> Using a seqlock for devnet_rename_seq is not a good idea,
>> as device_rename() can sleep.
>> 
>> As we hold RTNL, we dont need a protection for writers,
>> and only need a seqcount so that readers can catch a change done
>> by a writer.
>> 
>> Bug added in commit c91f6df2db4972d3 (sockopt: Change getsockopt() of
>> SO_BINDTODEVICE to return an interface name)
>> 
>> Reported-by: Dave Jones <davej@redhat.com>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Cc: Brian Haley <brian.haley@hp.com>
> 
> Sorry about that, thanks for the quick fix Eric.

Applied.

^ permalink raw reply

* Re: [PATCH] ipv4/ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally
From: David Miller @ 2012-12-21 21:15 UTC (permalink / raw)
  To: eric.dumazet; +Cc: yamahata, netdev
In-Reply-To: <1356054521.21834.4043.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 20 Dec 2012 17:48:41 -0800

> On Fri, 2012-12-21 at 10:12 +0900, Isaku Yamahata wrote:
>> ipgre_tunnel_xmit() parses network header as IP unconditionally.
>> But transmitting packets are not always IP packet. For example such packet
>> can be sent by packet socket with sockaddr_ll.sll_protocol set.
>> So make the function check if skb->protocol is IP.
>> 
>> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
>> ---
>>  net/ipv4/ip_gre.c |    5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
>> index a85ae2f..8fcf0ed 100644
>> --- a/net/ipv4/ip_gre.c
>> +++ b/net/ipv4/ip_gre.c
>> @@ -760,7 +760,10 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
>>  
>>  	if (dev->header_ops && dev->type == ARPHRD_IPGRE) {
>>  		gre_hlen = 0;
>> -		tiph = (const struct iphdr *)skb->data;
>> +		if (skb->protocol == htons(ETH_P_IP))
>> +			tiph = (const struct iphdr *)skb->data;
>> +		else
>> +			tiph = &tunnel->parms.iph;
>>  	} else {
>>  		gre_hlen = tunnel->hlen;
>>  		tiph = &tunnel->parms.iph;
> 
> Seems good to me thanks !
> 
> Acked-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH] ip_gre: fix possible use after free
From: David Miller @ 2012-12-21 21:15 UTC (permalink / raw)
  To: erdnetdev; +Cc: netdev, yamahata
In-Reply-To: <1356055227.21834.4097.camel@edumazet-glaptop>

From: Eric Dumazet <erdnetdev@gmail.com>
Date: Thu, 20 Dec 2012 18:00:27 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Once skb_realloc_headroom() is called, tiph might point to freed memory.
> 
> Cache tiph->ttl value before the reallocation, to avoid unexpected
> behavior.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Isaku Yamahata <yamahata@valinux.co.jp>

Applied.

^ permalink raw reply

* Re: TUN problems (regression?)
From: David Miller @ 2012-12-21 21:15 UTC (permalink / raw)
  To: jasowang; +Cc: shemminger, eric.dumazet, pmoore, netdev
In-Reply-To: <50D3D85B.1070605@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Fri, 21 Dec 2012 11:32:43 +0800

> On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
>> On Thu, 20 Dec 2012 15:38:17 -0800
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
>>>> [CC'ing netdev in case this is a known problem I just missed ...]
>>>>
>>>> Hi Jason,
>>>>
>>>> I started doing some more testing with the multiqueue TUN changes and I ran 
>>>> into a problem when running tunctl: running it once w/o arguments works as 
>>>> expected, but running it a second time results in failure and a 
>>>> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
>>>> on my test VM and happens independent of the LSM/SELinux fixup patches.
>>>>
>>>> Have you seen this before?
>>>>
>>> Obviously code in tun_flow_init() is wrong...
>>>
>>> static int tun_flow_init(struct tun_struct *tun)
>>> {
>>>         int i;
>>>
>>>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
>>>                                             sizeof(struct tun_flow_entry), 0, 0,
>>>                                             NULL);
>>>         if (!tun->flow_cache)
>>>                 return -ENOMEM;
>>> ...
>>> }
>>>
>>>
>>> I have no idea why we would need a kmem_cache per tun_struct,
>>> and why we even need a kmem_cache.
>> Normally flow malloc/free should be good enough.
>> It might make sense to use private kmem_cache if doing hlist_nulls.
>>
>>
>> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> Should be at least a global cache, I thought I can get some speed-up by
> using kmem_cache.
> 
> Acked-by: Jason Wang <jasowang@redhat.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox