Netdev List
 help / color / mirror / Atom feed
* Re: TCP sequence number inference attack on Linux
From: Eric Dumazet @ 2012-12-21 18:52 UTC (permalink / raw)
  To: Zhiyun Qian; +Cc: netdev
In-Reply-To: <1356114663.21834.7697.camel@edumazet-glaptop>

On Fri, 2012-12-21 at 10:31 -0800, Eric Dumazet wrote:

> I believe RFC 5961 was implemented in recent linux versions.
> 
> Is the described vulnerability still present ?
> 

By the way, I believe Chrome browser uses private network namespaces,
and statistics are per network namespace, so it should be safe.

^ permalink raw reply

* Re: [RFC] IP_MAX_MTU value
From: Eric Dumazet @ 2012-12-21 18:54 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, netdev
In-Reply-To: <50D4AF72.2020101@hp.com>

On Fri, 2012-12-21 at 10:50 -0800, Rick Jones wrote:
> On 12/21/2012 10:34 AM, Eric Dumazet wrote:
> > On Fri, 2012-12-21 at 10:19 -0800, Rick Jones wrote:
> >
> >> If you go beyond the protocol limit of an IPv4 datagram, won't it be
> >> necessary to  start being a bit more conditional on IPv4 vs IPv6?
> >>
> >
> > This IP_MAX_MTU is really an IPv4 thing (static to net/ipv4/route.c)
> 
> OK. Doesn't this:
> 
>          if (mtu > IP_MAX_MTU)
>                  mtu = IP_MAX_MTU;
> 
> mean it should be OK to go to 0xFFFF but not 0x10000?  Since 65535 is 
> the limit of an IPv4 datagram and so I would think would be the maximum 
> MTU for an IPv4 interface.

Sure, I meant 0xFFFF and it is the value I used in my tests.

65536 is the current MTU of loopback device, and this can be used by
other protocols.

^ permalink raw reply

* Re: TCP sequence number inference attack on Linux
From: Zhiyun Qian @ 2012-12-21 19:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1356114663.21834.7697.camel@edumazet-glaptop>

That's good to know. However, implementing RFC 5961 alone is not
sufficient. Like I said, since DelayedAckLost counter is incremented
purely upon looking at the sequence number, regardless of the ACK
number. An attacker thus can still infer the sequence number based on
DelayedAckLost counter without knowing the right ACK number.

The next question is how can the attacker eventually know the right
ACK number in order to inject real data. It turns out that this is not
hard either. First, based on the current Linux TCP stack, it accepts
incoming packets without ACK flag. Further, if ACK flag is not set,
ACK number will not be checked at all. See code in
net/ipv4/tcp_input.c, function tcp_rcv_established()

5547        if (th->ack && tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
5548                goto discard;

Second, even if ACK number is always checked before accepting the
payload, it is still possible to infer the ACK number much like how
sequence number can be inferred. The details is described in Section
3.4 of my paper, paragraph starting with "Client-side sequence number
inference".

I'm looking at the latest kernel v3.7.1 right now. I believe the
problem do still exist in today's Linux.

-Zhiyun

On Fri, Dec 21, 2012 at 1:31 PM, Eric Dumazet <erdnetdev@gmail.com> wrote:
> On Fri, 2012-12-21 at 12:58 -0500, Zhiyun Qian wrote:
>> Dear sir or madam,
>>
>> My name is Zhiyun Qian, a recent PhD graduate from University of
>> Michigan. As our recent research effort, along with my colleagues, we
>> identified a vulnerability related to Linux. Details described in our
>> paper published at this year's ACM Conference on Computer and
>> Communications Security (CCS): Collaborative TCP Sequence Number
>> Inference Attack available
>> http://web.eecs.umich.edu/~zhiyunq/pub/ccs12_TCP_sequence_number_inference.pdf
>>
>> Keywords: TCP, sequence number inference, DelayedAckLost counter,
>> privilege-escalation attack
>>
>> The vulnerability would allow an local malicious program to gain write
>> access to TCP connections of other applications. An example attack
>> scenario (on android) would be "an attacker uploads a seemingly benign
>> app to the google play, when run at the background, it can inject
>> malicious HTML payload into a webpage open by the browser".
>>
>> The problem is caused by the common TCP stats counters (the specific
>> counter I found is DelayedACKLost) maintained by the kernel (but
>> exposed to user space). By reading and reporting such counters to an
>> external attacker (colluded), the aforementioned attack can be
>> accomplished.
>>
>> It is essentially a side-channel attack (using TCP stats counters to
>> infer TCP sequence numbers), but it is so real and easy to carry out
>> that I believe it should be considered a vulnerability. From a
>> difference perspective, this attack can be considered as a privilege
>> escalation attack since it allows a local non-privileged program to
>> gain write access to TCP connections made by other processes.
>>
>> The lines of code in kernel impacted by the attack is:
>> net/ipv4/tcp_input.c at line 4166 (as in the latest kernel v3.7.1)
>>
>> 4160 static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
>> 4161 {
>> 4162        struct tcp_sock *tp = tcp_sk(sk);
>> 4163
>> 4164        if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
>> 4165            before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
>> 4166                NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
>> 4167                tcp_enter_quickack_mode(sk);
>> 4168
>> 4169                if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
>> 4170                        u32 end_seq = TCP_SKB_CB(skb)->end_seq;
>> 4171
>> 4172                        if (after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt))
>> 4173                                end_seq = tp->rcv_nxt;
>> 4174                        tcp_dsack_set(sk, TCP_SKB_CB(skb)->seq, end_seq);
>> 4175                }
>> 4176        }
>> 4177
>> 4178        tcp_send_ack(sk);
>> 4179 }
>>
>>
>> IMHO, an easy fix to the problem is to disallow unprivileged access to
>> counters such as DelayedAckLost. However, this solution may not be
>> ideal. A better way is to always enforce both acknowledgement number
>> and sequence number check on each incoming TCP packet instead of
>> checking one at a time. Currently, Linux TCP stack first only
>> validates the SEQ number and then subsequently the ACK number. If the
>> sequence number is invalid, the delayedAckLost counter will be
>> incremented (information about sequence number leaked already
>> regardless of the ACK number). To make attacker's job much harder (we
>> should require the attacker to guess both the valid sequence number
>> and ACK number at the same time). For instance, following RFC 5961:
>> (SND.UNA - MAX.SND.WND) <= SEG.ACK <= SND.NXT, this would require the
>> attacker to send many times (on the order of 10000) more packets to
>> conduct the same attack.
>>
>> Please feel free to ask me any questions regarding this vulnerability.
>
>
> I believe RFC 5961 was implemented in recent linux versions.
>
> Is the described vulnerability still present ?
>
>
>

^ permalink raw reply

* RE: [PATCH 1/3] net: stmmac: change GMAC control register for SGMII
From: Byungho An @ 2012-12-21 19:17 UTC (permalink / raw)
  To: 'Giuseppe CAVALLARO'
  Cc: davem, jeffrey.t.kirsher, netdev, kgene.kim, linux-kernel
In-Reply-To: <50BDEDE8.4030105@st.com>

On 12/04/2012 06:35 AM, Giuseppe CAVALLARO wrote:
>On 11/28/2012 11:57 AM, Byungho An wrote:
>> On 11/26/2012 07:31 PM, Giuseppe CABALLARO wrote:
>>> On 11/23/2012 10:04 AM, Byungho An wrote:
>>>>
>>>> This patch changes GMAC control register (TC(Transmit
>>>> Configuration) and PS(Port Selection) bit for SGMII.
>>>> In case of SGMII, TC bit is '1' and PS bit is 0.
>>>
>>> IMO this new support that should be released for net-next and further
>>> effort is actually needed.
>>>
>> OK, I see but if possible, I want to support the new features which is
>> included in this patch from v3.8
>
>ok I agree and I can support you.
>
>>
>>> The availability of the PCS registers is given by looking at the HW
>>> feature register. In fact, these are optional registers.
>>> I don't want to break the compatibility with old chips.
>>>
>> It means that old chip doesn't have this bit or this register? If that,
how
>> about using compatible in DT blob like snps,dwmac-3.70a and then in just
>> this case trying to read this bit and this register.
>
>The driver also works on mac 10/100 Databook 2.0 that has not these
>registers.
>
>>> I do not see why we have to use Kconfig macro to select ANE etc (as
>>> you do in your patches).
>> OK. I agree with you.
>
>we have to use the HW feature reg.
>
>>
>>> The driver could directly manage the phy device by itself if possible
>>> and the stmmac_init_phy should be reworked.
>>>
>> Could you explain more detail? As I understood, after set ANE bit in MAC
>> side then PHY auto-negotiation can be enabled. If I'm wrong let me know.
>> According to your mention, MAC and PHY auto-negotiation can be managed
>in
>> stmmac_init_phy?
>
>Currently the driver uses the Physical Abstraction Layer (PAL) to dialog
>with a PHY. On all the platforms supported (not only ST) we have always
>used it. Personally, I tested several phy devices with different MII
>interfaces (MII/RMII/GMII/RGMII ... ) but not TBI/RTBI/SGMII interfaces.
>
Currently I'm testing on SGMII interface and so far it's working fine.

>>> There are several things that need to be implemented. For example:
>>>
>>> The ISR (e.g. priv->hw->mac->host_irq_status) should be able to manage
>>> these new interrupts.
>> I think that there would be two additional interrupts."PCS
Auto-Negotiation
>> Complete" and "PCS Link Status Changed". These two interrupts are added
>to
>> "stmmac_interrupt". In my opinion, there are no specific processing for
>> these two irqs. What do you think about it?
>
>if the link changes this has to be logged in the driver.
>For example, depending on the link speed on some platforms we need to
>call dedicated call-back to set sysconfig registers or custom clocks.
>
>>> The code has to be able to maintain the user interface.
>>> For example if you want to enable ANE or manage Advertisement caps.
>>>
>> Does it mean that command line or other network command(e.g. ifconfig...)
>or
>> ioctol? Actually I don't understand exact user interface way. Could you
>> recommend the method for user interface?
>
>Using ethtool or mii-tool the user want to know the link status. So
>these kind of information have to be maintained.
>
As I know, user can set speed, duplex mode and auto-negotiation on/off using
ethtool.
If user wants to set auto-negotiation on, I think GMAC's ANE bit should be
ON as well.
For example, in stmmac_ethtool.c file, I try to add GMAC ANE bit enable
setting.
I think stmmac_set_pauseparam function is suitable for it.

	if (phy->autoneg) {
		if (netif_running(netdev)) {
 			ret = phy_start_aneg(phy);
+			value = readl(priv->ioaddr + GMAC_AN_CTRL);
+			value |= 0x1000;
+			writel(value, priv->ioaddr + GMAC_AN_CTRL);
+		}

What's your opinion?

>Take a look at the stmmac_adjust_link that is called by the PAL.
>
And for speed setting, if user changes speed to 1Gb using ethtool, link
status will be changed and then interrupt is occurred.
In the ISR, If interface is SGMII, I want to add SGMRAL setting and PS(port
selection)for guarantee 1Gb speed.
In the dwmac1000_core.c file.

+	if (unlikely(intr_status & pcs_link_irq)) {
+		readl(ioaddr + GMAC_AN_STATUS);
+		status |= core_irq_pcs_link;

+		if (priv->phydev->interface == PHY_INTERFACE_MODE_SGMII) {
+			value = readl(priv->ioaddr);
+			/* GMAC_CONTROL_PS : Port Selection for GMII */
+			value &= ~GMAC_CONTROL_PS;
+			writel(value, priv->ioaddr);
+			value = readl(priv->ioaddr + GMAC_AN_CTRL);
+			value = |= 0x40000;
+			writel(value, priv->ioaddr + GMAC_AN_CTRL);
+		}		
+	}

Then stmmac_interrupt in stmmac_main.c file call stmmac_adjust_link when
status is core_irq_pcs_link.
 
For auto-negotiation complete interrupt, I try to make patch like below

drivers/net/ethernet/stmicro/stmmac/common.h
@@ -184,6 +184,7 @@ enum core_specific_irq_mask {
 	core_irq_tx_path_exit_lpi_mode = 32,
 	core_irq_rx_path_in_lpi_mode = 64,
 	core_irq_rx_path_exit_lpi_mode = 128,
+	core_irq_rx_pcs_an = 256,
 };
 
 /* DMA HW capabilities */

drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -231,6 +231,11 @@ static int dwmac1000_irq_status(void __iomem *ioaddr)
 		readl(ioaddr + GMAC_PMT);
 		status |= core_irq_receive_pmt_irq;
 	}
+	if (unlikely(intr_status & pcs_ane_irq)) {
+		CHIP_DBG(KERN_INFO "GMAC: PCS Auto-negotiation complete\n");
+		readl(ioaddr + GMAC_AN_STATUS);
+		status |= core_irq_pcs_an;
+	}
 	/* MAC trx/rx EEE LPI entry/exit interrupts */
 	if (intr_status & lpiis_irq) {
 		/* Clean LPI interrupt by reading the Reg 12 */

drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -93,6 +93,7 @@ struct stmmac_priv {
 	int eee_enabled;
 	int eee_active;
 	int tx_lpi_timer;
+	bool core_pcs_an;
 };
 
 extern int phyaddr;

drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1658,6 +1658,8 @@ static irqreturn_t stmmac_interrupt(int irq, void
*dev_id)
 				priv->xstats.mmc_rx_csum_offload_irq_n++;
 			if (status & core_irq_receive_pmt_irq)
 				priv->xstats.irq_receive_pmt_irq_n++;
+			if (status & core_irq_pcs_an)
+				priv->core_pcs_an = true;
 
 			/* For LPI we need to save the tx status */
 			if (status & core_irq_tx_path_in_lpi_mode) {

>>>> Signed-off-by: Byungho An <bh74.an@samsung.com>
>>>> ---
>>>
>>> [snip]
>>>
>>>> +	if (priv->phydev->interface == PHY_INTERFACE_MODE_SGMII) {
>>>> +		value = readl(priv->ioaddr);
>>>> +		/* GMAC_CONTROL_TC : transmit config in RGMII/SGMII */
>>>> +		value |= 0x1000000;
>>>> +		/* GMAC_CONTROL_PS : Port Selection for GMII */
>>>> +		value &= ~(0x8000);
>>>> +		writel(value, priv->ioaddr);
>>>> +	}
>>>> +
>>>
>>>
>>> This parts of code have to be moved in
>>> drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>
>> OK.
>>
I moved this code todwmac1000_core.c using hw feature register like below

+#include "dwmac_dma.h"
 
 static void dwmac1000_core_init(void __iomem *ioaddr)
 {
 	u32 value = readl(ioaddr + GMAC_CONTROL);
+	u32 hw_feature = readl(ioaddr + DMA_HW_FEATURE);
 	value |= GMAC_CORE_INIT;
 	writel(value, ioaddr + GMAC_CONTROL);
+	hw_feature &= DMA_HW_FEAT_ACTPHYIF;
+
+	if ((hw_feature >> 28 < 2) && (hw_feature != 0) {
+		/* transmit config in RGMII/SGMII */
+		value |= GMAC_CONTROL_TC;
+	}
+	writel(value, ioaddr + GMAC_CONTROL);

>>> Pls, do not use value |= 0x1000000 but provide the appropriate defines.
>>>
>> OK.
>>
>>>>    	/* Request the IRQ lines */
>>>>    	ret = request_irq(dev->irq, stmmac_interrupt,
>>>>    			 IRQF_SHARED, dev->name, dev);
>>>>
>> Thank you.
>
>you are welcome
>Peppe
>
>> Byungho An.
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
Thank you.
Byungho An.

 --
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org More majordomo
info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: TCP sequence number inference attack on Linux
From: Zhiyun Qian @ 2012-12-21 19:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1356115931.21834.7745.camel@edumazet-glaptop>

That seems like a good idea. I am not sure how it is implemented
though. Is it a new feature of Linux? Would you mind sending some
pointers for this?

Thanks.
-Zhiyun

On Fri, Dec 21, 2012 at 1:52 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2012-12-21 at 10:31 -0800, Eric Dumazet wrote:
>
>> I believe RFC 5961 was implemented in recent linux versions.
>>
>> Is the described vulnerability still present ?
>>
>
> By the way, I believe Chrome browser uses private network namespaces,
> and statistics are per network namespace, so it should be safe.
>
>
>

^ permalink raw reply

* Re: TCP sequence number inference attack on Linux
From: Eric Dumazet @ 2012-12-21 19:27 UTC (permalink / raw)
  To: Zhiyun Qian; +Cc: netdev
In-Reply-To: <CALvgte9x4LWV8rsquKzNoVcXM14xZjvyuTgpHr-ynZXfGtzzpw@mail.gmail.com>

On Fri, 2012-12-21 at 14:10 -0500, Zhiyun Qian wrote:
> That's good to know. However, implementing RFC 5961 alone is not
> sufficient. Like I said, since DelayedAckLost counter is incremented
> purely upon looking at the sequence number, regardless of the ACK
> number. An attacker thus can still infer the sequence number based on
> DelayedAckLost counter without knowing the right ACK number.
> 



> The next question is how can the attacker eventually know the right
> ACK number in order to inject real data. It turns out that this is not
> hard either. First, based on the current Linux TCP stack, it accepts
> incoming packets without ACK flag. 

I dont really think so.

We must discard frame is th->ack is not set. (Step 5, line 6142)



> Further, if ACK flag is not set,
> ACK number will not be checked at all. See code in
> net/ipv4/tcp_input.c, function tcp_rcv_established()
> 
> 5547        if (th->ack && tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
> 5548                goto discard;
> 
> Second, even if ACK number is always checked before accepting the
> payload, it is still possible to infer the ACK number much like how
> sequence number can be inferred. The details is described in Section
> 3.4 of my paper, paragraph starting with "Client-side sequence number
> inference".
> 
> I'm looking at the latest kernel v3.7.1 right now. I believe the
> problem do still exist in today's Linux.
> 

It seems you know pretty well this code, I wonder why you dont send
patches to fix the bugs...

Its not like it has to be buggy forever.

^ permalink raw reply

* Re: TCP sequence number inference attack on Linux
From: Eric Dumazet @ 2012-12-21 19:30 UTC (permalink / raw)
  To: Zhiyun Qian; +Cc: netdev
In-Reply-To: <CALvgte-=v=bRnQ2HPDxyqxBOjD6DXp_BUySW6Hm7h+81M+RYMA@mail.gmail.com>

On Fri, 2012-12-21 at 14:13 -0500, Zhiyun Qian wrote:
> That seems like a good idea. I am not sure how it is implemented
> though. Is it a new feature of Linux? Would you mind sending some
> pointers for this?

I dont have details, I only remember having to fix a memory allocation
error that was hitting Chrome users.


http://comments.gmane.org/gmane.linux.network/249535


Julien said at that time :

<quote>
It happens when users start Chrome. Chrome will create one new network
NS (for the sandbox).

This has been used for a few years now, but we had our first report in
January of this year and we've been getting a few reports very
recently at a rate that is starting to worry me (crbug.com/110756).

Thanks a lot for helping with this!

Julien
</quote>

^ permalink raw reply

* Re: IPv6 over Firewire
From: YOSHIFUJI Hideaki @ 2012-12-21 19:49 UTC (permalink / raw)
  To: stephan.gatzka; +Cc: netdev, linux1394-devel, YOSHIFUJI Hideaki
In-Reply-To: <50D4ACFA.6040901@gmail.com>

Stephan Gatzka wrote:
> 
>> If you are talking about how to build NS/NA/RS/Redirect messages, you
>> can just use ndisc_opt_addr_space() and ndisc_fill_addr_option() here.
> 
> Thanks, these functions are certainly helpful. But ndisc_opt_addr_space() calculates the required space from dev->addr_len and ndisc_addr_option_pad(dev->type). The latter is 0 for IEEE1394 (firewire). So the required option space just comes from dev->addr_len, which is 8 for firewire, resulting in an option address space of 16 (2 octets).
> 
> But rfc3146 requires an option address space of 3 octets. So my main question is if in such a situation the best is to reserve additional skb tail room using needed_tailroom in struct netdevice. This directly affects the memory allocated in ndisc_build_skb().

Something like this:

 static inline int ndisc_opt_addr_space(struct net_device *dev)
 {
-       return NDISC_OPT_SPACE(dev->addr_len + ndisc_addr_option_pad(dev->type));
+       switch (dev->type) {
+       case ARPHRD_IEEE1394:
+               return sizeof(struct ndisc_opt_ieee1394_llinfo);
+       default:
+               return NDISC_OPT_SPACE(dev->addr_len + ndisc_addr_option_pad(dev->type));
+       }
 }

--yoshfuji

^ permalink raw reply

* Re: TCP sequence number inference attack on Linux
From: Zhiyun Qian @ 2012-12-21 19:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1356118052.21834.7793.camel@edumazet-glaptop>

On Fri, Dec 21, 2012 at 2:27 PM, Eric Dumazet <erdnetdev@gmail.com> wrote:
> On Fri, 2012-12-21 at 14:10 -0500, Zhiyun Qian wrote:
>> That's good to know. However, implementing RFC 5961 alone is not
>> sufficient. Like I said, since DelayedAckLost counter is incremented
>> purely upon looking at the sequence number, regardless of the ACK
>> number. An attacker thus can still infer the sequence number based on
>> DelayedAckLost counter without knowing the right ACK number.
>>
>
>
>
>> The next question is how can the attacker eventually know the right
>> ACK number in order to inject real data. It turns out that this is not
>> hard either. First, based on the current Linux TCP stack, it accepts
>> incoming packets without ACK flag.
>
> I dont really think so.
>
> We must discard frame is th->ack is not set. (Step 5, line 6142)
>

If I am not mistaken, line 6142 in kernel v3.7.1 corresponds to
tcp_rcv_state_process(). According to the comments, "This function
implements the receiving procedure of RFC 793 for all states except
ESTABLISHED and TIME_WAIT." Are you referring to a different kernel
version?

>
>
>> Further, if ACK flag is not set,
>> ACK number will not be checked at all. See code in
>> net/ipv4/tcp_input.c, function tcp_rcv_established()
>>
>> 5547        if (th->ack && tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
>> 5548                goto discard;
>>
>> Second, even if ACK number is always checked before accepting the
>> payload, it is still possible to infer the ACK number much like how
>> sequence number can be inferred. The details is described in Section
>> 3.4 of my paper, paragraph starting with "Client-side sequence number
>> inference".
>>
>> I'm looking at the latest kernel v3.7.1 right now. I believe the
>> problem do still exist in today's Linux.
>>
>
> It seems you know pretty well this code, I wonder why you dont send
> patches to fix the bugs...
>
> Its not like it has to be buggy forever.
>

I have never submitted any patch before...I would do it if no one else
wants to :)

>
>

^ permalink raw reply

* Re: [RFC] IP_MAX_MTU value
From: David Miller @ 2012-12-21 19:59 UTC (permalink / raw)
  To: erdnetdev; +Cc: netdev, rick.jones2, kuznet
In-Reply-To: <1356072468.21834.4805.camel@edumazet-glaptop>

From: Eric Dumazet <erdnetdev@gmail.com>
Date: Thu, 20 Dec 2012 22:47:48 -0800

> We have the following definition in net/ipv4/route.c
> 
> #define IP_MAX_MTU   0xFFF0
> 
> This means that "netperf -t UDP_STREAM", using UDP messages of 65507
> bytes, are fragmented on loopback interface (while its MTU is now 65536
> and should allow those UDP messages being sent without fragments)
> 
> I guess Rick chose 65507 bytes in netperf because it was related to the
> max IPv4 datagram length :
> 
> 65507 + 28 = 65535
> 
> Changing IP_MAX_MTU from 0xFFF0 to 0x10000 seems safe [1], but I might
> miss something really obvious ?
> 
> It might be because in old days we reserved 16 bytes for the ethernet
> header, and we wanted to avoid kmalloc() round-up to kmalloc-131072
> slab ?
> 
> If so, we certainly can limit skb->head to 32 or 64 KB and complete with
> page fragments the remaining space.

I don't think it has to do with kmalloc() at all.

Maybe something strange to do with the fact that each frag has to
be an 8-byte multiple, or something like that?

Alexey choose this value back in 1998, maybe he remembers the
reason for the strange value.

Alexey?

^ permalink raw reply

* [PATCH] net: sched: integer overflow fix
From: Stefan Hasko @ 2012-12-21 20:39 UTC (permalink / raw)
  To: Jamal Hadi Salim, David S. Miller, netdev; +Cc: linux-kernel, Stefan Hasko

Fixed integer overflow in function htb_dequeue

Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
---
 net/sched/sch_htb.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index d2922c0..1bd3faa 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -919,7 +919,7 @@ ok:
 	q->now = ktime_to_ns(ktime_get());
 	start_at = jiffies;
 
-	next_event = q->now + 5 * NSEC_PER_SEC;
+	next_event = q->now + (u32)5 * NSEC_PER_SEC;
 
 	for (level = 0; level < TC_HTB_MAXDEPTH; level++) {
 		/* common case optimization - skip event handler quickly */
-- 
1.7.10.4

^ permalink raw reply related

* Re: sock_ioctl sleeping while atomic warning during boot.
From: David Miller @ 2012-12-21 21:15 UTC (permalink / raw)
  To: brian.haley; +Cc: eric.dumazet, davej, netdev
In-Reply-To: <50D3D7F7.2030200@hp.com>

From: Brian Haley <brian.haley@hp.com>
Date: Thu, 20 Dec 2012 22:31:03 -0500

> On 12/20/2012 10:25 PM, Eric Dumazet wrote:
>> OK, thanks for the report.
>> 
>> We need a seqcount, not a seqlock, as RTNL already protects multiple
>> writers.
>> 
>> Please try following fix :
>> 
>> 
>> [PATCH] net: devnet_rename_seq should be a seqcount
>> 
>> Using a seqlock for devnet_rename_seq is not a good idea,
>> as device_rename() can sleep.
>> 
>> As we hold RTNL, we dont need a protection for writers,
>> and only need a seqcount so that readers can catch a change done
>> by a writer.
>> 
>> Bug added in commit c91f6df2db4972d3 (sockopt: Change getsockopt() of
>> SO_BINDTODEVICE to return an interface name)
>> 
>> Reported-by: Dave Jones <davej@redhat.com>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Cc: Brian Haley <brian.haley@hp.com>
> 
> Sorry about that, thanks for the quick fix Eric.

Applied.

^ permalink raw reply

* Re: [PATCH] ipv4/ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally
From: David Miller @ 2012-12-21 21:15 UTC (permalink / raw)
  To: eric.dumazet; +Cc: yamahata, netdev
In-Reply-To: <1356054521.21834.4043.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 20 Dec 2012 17:48:41 -0800

> On Fri, 2012-12-21 at 10:12 +0900, Isaku Yamahata wrote:
>> ipgre_tunnel_xmit() parses network header as IP unconditionally.
>> But transmitting packets are not always IP packet. For example such packet
>> can be sent by packet socket with sockaddr_ll.sll_protocol set.
>> So make the function check if skb->protocol is IP.
>> 
>> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
>> ---
>>  net/ipv4/ip_gre.c |    5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
>> index a85ae2f..8fcf0ed 100644
>> --- a/net/ipv4/ip_gre.c
>> +++ b/net/ipv4/ip_gre.c
>> @@ -760,7 +760,10 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
>>  
>>  	if (dev->header_ops && dev->type == ARPHRD_IPGRE) {
>>  		gre_hlen = 0;
>> -		tiph = (const struct iphdr *)skb->data;
>> +		if (skb->protocol == htons(ETH_P_IP))
>> +			tiph = (const struct iphdr *)skb->data;
>> +		else
>> +			tiph = &tunnel->parms.iph;
>>  	} else {
>>  		gre_hlen = tunnel->hlen;
>>  		tiph = &tunnel->parms.iph;
> 
> Seems good to me thanks !
> 
> Acked-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH] ip_gre: fix possible use after free
From: David Miller @ 2012-12-21 21:15 UTC (permalink / raw)
  To: erdnetdev; +Cc: netdev, yamahata
In-Reply-To: <1356055227.21834.4097.camel@edumazet-glaptop>

From: Eric Dumazet <erdnetdev@gmail.com>
Date: Thu, 20 Dec 2012 18:00:27 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Once skb_realloc_headroom() is called, tiph might point to freed memory.
> 
> Cache tiph->ttl value before the reallocation, to avoid unexpected
> behavior.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Isaku Yamahata <yamahata@valinux.co.jp>

Applied.

^ permalink raw reply

* Re: TUN problems (regression?)
From: David Miller @ 2012-12-21 21:15 UTC (permalink / raw)
  To: jasowang; +Cc: shemminger, eric.dumazet, pmoore, netdev
In-Reply-To: <50D3D85B.1070605@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Fri, 21 Dec 2012 11:32:43 +0800

> On 12/21/2012 07:50 AM, Stephen Hemminger wrote:
>> On Thu, 20 Dec 2012 15:38:17 -0800
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
>>>> [CC'ing netdev in case this is a known problem I just missed ...]
>>>>
>>>> Hi Jason,
>>>>
>>>> I started doing some more testing with the multiqueue TUN changes and I ran 
>>>> into a problem when running tunctl: running it once w/o arguments works as 
>>>> expected, but running it a second time results in failure and a 
>>>> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
>>>> on my test VM and happens independent of the LSM/SELinux fixup patches.
>>>>
>>>> Have you seen this before?
>>>>
>>> Obviously code in tun_flow_init() is wrong...
>>>
>>> static int tun_flow_init(struct tun_struct *tun)
>>> {
>>>         int i;
>>>
>>>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
>>>                                             sizeof(struct tun_flow_entry), 0, 0,
>>>                                             NULL);
>>>         if (!tun->flow_cache)
>>>                 return -ENOMEM;
>>> ...
>>> }
>>>
>>>
>>> I have no idea why we would need a kmem_cache per tun_struct,
>>> and why we even need a kmem_cache.
>> Normally flow malloc/free should be good enough.
>> It might make sense to use private kmem_cache if doing hlist_nulls.
>>
>>
>> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> Should be at least a global cache, I thought I can get some speed-up by
> using kmem_cache.
> 
> Acked-by: Jason Wang <jasowang@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH] ipv4: arp: fix a lockdep splat in arp_solicit()
From: David Miller @ 2012-12-21 21:16 UTC (permalink / raw)
  To: erdnetdev; +Cc: netdev, yanb, shemminger
In-Reply-To: <1356111130.21834.7565.camel@edumazet-glaptop>

From: Eric Dumazet <erdnetdev@gmail.com>
Date: Fri, 21 Dec 2012 09:32:10 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Yan Burman reported following lockdep warning :
 ...
> Bug is from arp_solicit(), releasing the neigh lock after arp_send()
> In case of vxlan, we eventually need to write lock a neigh lock later.
> 
> Its a false positive, but we can get rid of it without lockdep
> annotations.
> 
> We can instead use neigh_ha_snapshot() helper.
> 
> Reported-by: Yan Burman <yanb@mellanox.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: [patch] solos-pci: double lock in geos_gpio_store()
From: David Miller @ 2012-12-21 21:16 UTC (permalink / raw)
  To: david.woodhouse
  Cc: dan.carpenter, chas, nathan, linux-atm-general, netdev, kbuild
In-Reply-To: <1355995348.18919.104.camel@shinybook.infradead.org>

From: "Woodhouse, David" <david.woodhouse@intel.com>
Date: Thu, 20 Dec 2012 09:22:26 +0000

> On Thu, 2012-12-20 at 10:48 +0300, Dan Carpenter wrote:
>> There is a typo here so we do a double lock instead of an unlock.
>> 
>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>> ---
>> Only needed in linux-next.  Introduced in f9baad02e7411d9 [14/17]
>> solos-pci: add GPIO support for newer versions on Geos board
> 
> Ah, crap. That was my fault; it was part of my "improvement" to Nathan's
> patch. Sorry.
> 
> Acked-by: David Woodhouse <David.Woodhouse@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH] bridge: call br_netpoll_disable in br_add_if
From: David Miller @ 2012-12-21 21:17 UTC (permalink / raw)
  To: amwang; +Cc: gaofeng, netdev, shemminger
In-Reply-To: <1355999613.25310.11.camel@cr0>

From: Cong Wang <amwang@redhat.com>
Date: Thu, 20 Dec 2012 18:33:33 +0800

> On Thu, 2012-12-20 at 17:41 +0800, Gao feng wrote:
>> When netdev_set_master faild in br_add_if, we should
>> call br_netpoll_disable to do some cleanup jobs,such
>> as free the memory of struct netpoll which allocated
>> in br_netpoll_enable.
>> 
>> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> 
> Looks good!
> 
> Acked-by: Cong Wang <amwang@redhat.com>

Applied.

^ permalink raw reply

* Is keepalive behaving as expected in 3.7.0+/net-next?
From: Rick Jones @ 2012-12-21 22:05 UTC (permalink / raw)
  To: netdev

I was looking to do a bit more documentation clean-up and thought I 
would work on the descriptions of the "keepalive" sysctls, but first I 
wanted to see if they behaved as the existing descriptions suggested:

> tcp_keepalive_time - INTEGER
>         How often TCP sends out keepalive messages when keepalive is enabled.
>         Default: 2hours.
>
> tcp_keepalive_probes - INTEGER
>         How many keepalive probes TCP sends out, until it decides that the
>         connection is broken. Default value: 9.
>
> tcp_keepalive_intvl - INTEGER
>         How frequently the probes are send out. Multiplied by
>         tcp_keepalive_probes it is time to kill not responding connection,
>         after probes started. Default value: 75sec i.e. connection
>         will be aborted after ~11 minutes of retries.

I interpreted all that that as:  When a connection is idle, TCP will 
send a keepalive probe every tcp_keepalive_time seconds.  If a response 
to a keepalive probe is not received, TCP will resend (retransmit) it 
every tcp_keepalive_intvl seconds.

However, what I see is that on a connection where the remote is indeed 
still there, only the first keepalive probe is sent after 
tcp_keepalive_time, and thereafter it is sent every tcp_keepalive_intvl 
seconds.

Now, some of this may relate to my being impatient - rather than wait 
two hours for the first probe, I set tcp_keepalive_time to 3 seconds, 
and tcp_keepalive_intvl to 7 seconds.  I then kicked-off a ./configure 
--intervals-enable netperf TCP_RR test with a burst of one and a wait 
time of 90 seconds and got the following (trimmed) trace:

13:43:46.879133 IP netnextraj.43054 > netnextraj2.srvr: Flags [S], seq 
807869796, win 14600, options [mss 1460,sackOK,TS val 133470 ecr 
0,nop,wscale 7], length 0
13:43:46.880091 IP netnextraj2.srvr > netnextraj.43054: Flags [S.], seq 
1522345902, ack 807869797, win 14480, options [mss 1460,sackOK,TS val 
136186 ecr 133470,nop,wscale 4], length 0
13:43:46.880114 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack 
1, win 115, options [nop,nop,TS val 133470 ecr 136186], length 0
13:43:46.880306 IP netnextraj.43054 > netnextraj2.srvr: Flags [P.], seq 
1:11, ack 1, win 115, options [nop,nop,TS val 133470 ecr 136186], length 10
13:43:46.880948 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack 
11, win 905, options [nop,nop,TS val 136187 ecr 133470], length 0
13:43:46.880964 IP netnextraj2.srvr > netnextraj.43054: Flags [P.], seq 
1:11, ack 11, win 905, options [nop,nop,TS val 136187 ecr 133470], length 10
13:43:46.881161 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack 
11, win 115, options [nop,nop,TS val 133470 ecr 136187], length 0

The first probe above comes after 3 seconds - tcp_keepalive_time - at 
13:43:49

13:43:49.886752 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack 
11, win 115, options [nop,nop,TS val 134222 ecr 136187], length 0

And it does seem to elicit a response:

13:43:49.887530 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack 
11, win 905, options [nop,nop,TS val 136938 ecr 133470], length 0

Now it starts sending probes every 7 seconds (tcp_keepalive_intvl):

13:43:56.903576 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack 
11, win 115, options [nop,nop,TS val 135976 ecr 136938], length 0
13:43:56.904480 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack 
11, win 905, options [nop,nop,TS val 138693 ecr 133470], length 0
13:44:03.910744 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack 
11, win 115, options [nop,nop,TS val 137728 ecr 138693], length 0
13:44:03.911623 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack 
11, win 905, options [nop,nop,TS val 140444 ecr 133470], length 0

I;ve deleted the next 9 or so probes...  It continues, and doesn't 
terminate the connection, so I assume it was happy with the responses to 
the probes.

13:45:13.990746 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack 
11, win 115, options [nop,nop,TS val 155248 ecr 156213], length 0
13:45:13.991578 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack 
11, win 905, options [nop,nop,TS val 157965 ecr 133470], length 0

Now the next netperf transaction happens:

13:45:16.879222 IP netnextraj.43054 > netnextraj2.srvr: Flags [P.], seq 
11:21, ack 11, win 115, options [nop,nop,TS val 155970 ecr 157965], 
length 10
13:45:16.880033 IP netnextraj2.srvr > netnextraj.43054: Flags [P.], seq 
11:21, ack 21, win 905, options [nop,nop,TS val 158687 ecr 155970], 
length 10
13:45:16.880220 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack 
21, win 115, options [nop,nop,TS val 155970 ecr 158687], length 0

But the next keepalive probe is tcp_keepalive_intvl seconds after the 
last one, rather than that many, or tcp_keepalive_time seconds after the 
connection was last "active."

13:45:20.998739 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack 
21, win 115, options [nop,nop,TS val 157000 ecr 158687], length 0
13:45:20.999754 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack 
21, win 905, options [nop,nop,TS val 159717 ecr 155970], length 0
13:45:28.006747 IP netnextraj.43054 > netnextraj2.srvr: Flags [.], ack 
21, win 115, options [nop,nop,TS val 158752 ecr 159717], length 0
13:45:28.007624 IP netnextraj2.srvr > netnextraj.43054: Flags [.], ack 
21, win 905, options [nop,nop,TS val 161469 ecr 155970], length 0

Is this the expected behaviour?  If I reverse the values - make 
tcp_keepalive_time 7 and tcp_keepalive_intvl 3, it seems that all the 
probes are after 7 seconds.

rick jones

^ permalink raw reply

* Re: TCP sequence number inference attack on Linux
From: Eric Dumazet @ 2012-12-21 22:45 UTC (permalink / raw)
  To: Zhiyun Qian; +Cc: netdev
In-Reply-To: <CALvgte9WJRR95B5WCTavF0w3Kvf+MksZtyY4OQ=ioAe71BasCw@mail.gmail.com>

On Fri, 2012-12-21 at 14:49 -0500, Zhiyun Qian wrote:

> If I am not mistaken, line 6142 in kernel v3.7.1 corresponds to
> tcp_rcv_state_process(). According to the comments, "This function
> implements the receiving procedure of RFC 793 for all states except
> ESTABLISHED and TIME_WAIT." Are you referring to a different kernel
> version?

You are not mistaken, it seems code is too permissive.

We should reject a frame without ACK flag while in ESTABLISHED state.

Thats explicitly stated in RFC 973.

Then we should make all possible safety checks before even sending a
frame or changing socket variables.

(For instance the tests done in tcp_ack() should be done before calling
tcp_validate_incoming())

John Dykstra in commit 96e0bf4b5193d0 (tcp: Discard segments that ack
data not yet sent) did a step into right direction, but missed this.

Current code assumes the incoming frame is mostly legitimate.

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a136925..2ea4937 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5551,7 +5551,7 @@ slow_path:
 		return 0;
 
 step5:
-	if (th->ack && tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
+	if (!th->ack || tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
 		goto discard;
 
 	/* ts_recent update must be made after we are sure that the packet

^ permalink raw reply related

* Re: [PATCH] net: sched: integer overflow fix
From: Eric Dumazet @ 2012-12-21 22:51 UTC (permalink / raw)
  To: Stefan Hasko
  Cc: Jamal Hadi Salim, David S. Miller, netdev, linux-kernel,
	Vimalkumar
In-Reply-To: <1356122364-1418-1-git-send-email-hasko.stevo@gmail.com>

On Fri, 2012-12-21 at 21:39 +0100, Stefan Hasko wrote:
> Fixed integer overflow in function htb_dequeue
> 
> Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
> ---
>  net/sched/sch_htb.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> index d2922c0..1bd3faa 100644
> --- a/net/sched/sch_htb.c
> +++ b/net/sched/sch_htb.c
> @@ -919,7 +919,7 @@ ok:
>  	q->now = ktime_to_ns(ktime_get());
>  	start_at = jiffies;
>  
> -	next_event = q->now + 5 * NSEC_PER_SEC;
> +	next_event = q->now + (u32)5 * NSEC_PER_SEC;
>  
>  	for (level = 0; level < TC_HTB_MAXDEPTH; level++) {
>  		/* common case optimization - skip event handler quickly */

But this patch is wrong !

I am a bit surprised, as I remember spotting this error in one patch
submission from Vimalkumar.

Apparently it got lost.

Please fix the bug for good, not adding another one.

Thanks

^ permalink raw reply

* Re: IPv6 over Firewire
From: Stefan Richter @ 2012-12-21 23:12 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki, stephan.gatzka; +Cc: netdev, linux1394-devel
In-Reply-To: <50D4BD2F.7060006@linux-ipv6.org>

On Dec 22 YOSHIFUJI Hideaki wrote:
> Stephan Gatzka wrote:
> > 
> >> If you are talking about how to build NS/NA/RS/Redirect messages, you
> >> can just use ndisc_opt_addr_space() and ndisc_fill_addr_option() here.
> > 
> > Thanks, these functions are certainly helpful. But ndisc_opt_addr_space() calculates the required space from dev->addr_len and ndisc_addr_option_pad(dev->type). The latter is 0 for IEEE1394 (firewire). So the required option space just comes from dev->addr_len, which is 8 for firewire, resulting in an option address space of 16 (2 octets).
> > 
> > But rfc3146 requires an option address space of 3 octets. So my main question is if in such a situation the best is to reserve additional skb tail room using needed_tailroom in struct netdevice. This directly affects the memory allocated in ndisc_build_skb().
> 
> Something like this:
> 
>  static inline int ndisc_opt_addr_space(struct net_device *dev)
>  {
> -       return NDISC_OPT_SPACE(dev->addr_len + ndisc_addr_option_pad(dev->type));
> +       switch (dev->type) {
> +       case ARPHRD_IEEE1394:
> +               return sizeof(struct ndisc_opt_ieee1394_llinfo);
> +       default:
> +               return NDISC_OPT_SPACE(dev->addr_len + ndisc_addr_option_pad(dev->type));
> +       }
>  }

Can't we increase dev->addr_len for RFC 3146 interfaces?
Can't we add another dev->type besides ARPHRD_IEEE1394 (RFC 2734)?

Is a single dev instance transporting both IPv4 and IPv6 or will there be
separate instances for those?
-- 
Stefan Richter
-=====-===-- ==-- =-==-
http://arcgraph.de/sr/

^ permalink raw reply

* [PATCH] CONFIG_HOTPLUG removal from networking core
From: Greg Kroah-Hartman @ 2012-12-21 23:44 UTC (permalink / raw)
  To: netdev; +Cc: Bill Pemberton

CONFIG_HOTPLUG is always enabled now, so remove the unused code that was
trying to be compiled out when this option was disabled, in the
networking core.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 334efd5..28c5f5a 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1334,7 +1334,6 @@ struct kobj_ns_type_operations net_ns_type_operations = {
 };
 EXPORT_SYMBOL_GPL(net_ns_type_operations);
 
-#ifdef CONFIG_HOTPLUG
 static int netdev_uevent(struct device *d, struct kobj_uevent_env *env)
 {
 	struct net_device *dev = to_net_dev(d);
@@ -1353,7 +1352,6 @@ static int netdev_uevent(struct device *d, struct kobj_uevent_env *env)
 exit:
 	return retval;
 }
-#endif
 
 /*
  *	netdev_release -- destroy and free a dead device.
@@ -1382,9 +1380,7 @@ static struct class net_class = {
 #ifdef CONFIG_SYSFS
 	.dev_attrs = net_class_attributes,
 #endif /* CONFIG_SYSFS */
-#ifdef CONFIG_HOTPLUG
 	.dev_uevent = netdev_uevent,
-#endif
 	.ns_type = &net_ns_type_operations,
 	.namespace = net_namespace,
 };
diff --git a/net/wireless/reg.c b/net/wireless/reg.c
index 6e53089..82c4fc7 100644
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -2365,7 +2365,6 @@ int set_regdom(const struct ieee80211_regdomain *rd)
 	return r;
 }
 
-#ifdef CONFIG_HOTPLUG
 int reg_device_uevent(struct device *dev, struct kobj_uevent_env *env)
 {
 	if (last_request && !last_request->processed) {
@@ -2377,12 +2376,6 @@ int reg_device_uevent(struct device *dev, struct kobj_uevent_env *env)
 
 	return 0;
 }
-#else
-int reg_device_uevent(struct device *dev, struct kobj_uevent_env *env)
-{
-	return -ENODEV;
-}
-#endif /* CONFIG_HOTPLUG */
 
 void wiphy_regulatory_register(struct wiphy *wiphy)
 {
diff --git a/net/wireless/sysfs.c b/net/wireless/sysfs.c
index 9bf6d5e..1f6f01e 100644
--- a/net/wireless/sysfs.c
+++ b/net/wireless/sysfs.c
@@ -77,13 +77,11 @@ static void wiphy_dev_release(struct device *dev)
 	cfg80211_dev_free(rdev);
 }
 
-#ifdef CONFIG_HOTPLUG
 static int wiphy_uevent(struct device *dev, struct kobj_uevent_env *env)
 {
 	/* TODO, we probably need stuff here */
 	return 0;
 }
-#endif
 
 static int wiphy_suspend(struct device *dev, pm_message_t state)
 {
@@ -134,9 +132,7 @@ struct class ieee80211_class = {
 	.owner = THIS_MODULE,
 	.dev_release = wiphy_dev_release,
 	.dev_attrs = ieee80211_dev_attrs,
-#ifdef CONFIG_HOTPLUG
 	.dev_uevent = wiphy_uevent,
-#endif
 	.suspend = wiphy_suspend,
 	.resume = wiphy_resume,
 	.ns_type = &net_ns_type_operations,

^ permalink raw reply related

* [PATCH] Drivers: network: more __dev* removal
From: Greg Kroah-Hartman @ 2012-12-21 23:42 UTC (permalink / raw)
  To: netdev; +Cc: Bill Pemberton

Remove some __dev* markings that snuck in the 3.8-rc1 merge window in
the drivers/net/* directory.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

diff --git a/drivers/net/ethernet/marvell/mvmdio.c b/drivers/net/ethernet/marvell/mvmdio.c
index 6d6002b..74f1c15 100644
--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -141,7 +141,7 @@ static int orion_mdio_reset(struct mii_bus *bus)
 	return 0;
 }
 
-static int __devinit orion_mdio_probe(struct platform_device *pdev)
+static int orion_mdio_probe(struct platform_device *pdev)
 {
 	struct device_node *np = pdev->dev.of_node;
 	struct mii_bus *bus;
@@ -197,7 +197,7 @@ static int __devinit orion_mdio_probe(struct platform_device *pdev)
 	return 0;
 }
 
-static int __devexit orion_mdio_remove(struct platform_device *pdev)
+static int orion_mdio_remove(struct platform_device *pdev)
 {
 	struct mii_bus *bus = platform_get_drvdata(pdev);
 	mdiobus_unregister(bus);
@@ -214,7 +214,7 @@ MODULE_DEVICE_TABLE(of, orion_mdio_match);
 
 static struct platform_driver orion_mdio_driver = {
 	.probe = orion_mdio_probe,
-	.remove = __devexit_p(orion_mdio_remove),
+	.remove = orion_mdio_remove,
 	.driver = {
 		.name = "orion-mdio",
 		.of_match_table = orion_mdio_match,
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 3f8086b..b6025c3 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -635,7 +635,7 @@ static void mvneta_rxq_bm_disable(struct mvneta_port *pp,
 
 
 /* Sets the RGMII Enable bit (RGMIIEn) in port MAC control register */
-static void __devinit mvneta_gmac_rgmii_set(struct mvneta_port *pp, int enable)
+static void mvneta_gmac_rgmii_set(struct mvneta_port *pp, int enable)
 {
 	u32  val;
 
@@ -650,7 +650,7 @@ static void __devinit mvneta_gmac_rgmii_set(struct mvneta_port *pp, int enable)
 }
 
 /* Config SGMII port */
-static void __devinit mvneta_port_sgmii_config(struct mvneta_port *pp)
+static void mvneta_port_sgmii_config(struct mvneta_port *pp)
 {
 	u32 val;
 
@@ -2564,7 +2564,7 @@ const struct ethtool_ops mvneta_eth_tool_ops = {
 };
 
 /* Initialize hw */
-static int __devinit mvneta_init(struct mvneta_port *pp, int phy_addr)
+static int mvneta_init(struct mvneta_port *pp, int phy_addr)
 {
 	int queue;
 
@@ -2613,9 +2613,8 @@ static void mvneta_deinit(struct mvneta_port *pp)
 }
 
 /* platform glue : initialize decoding windows */
-static void __devinit
-mvneta_conf_mbus_windows(struct mvneta_port *pp,
-			 const struct mbus_dram_target_info *dram)
+static void mvneta_conf_mbus_windows(struct mvneta_port *pp,
+				     const struct mbus_dram_target_info *dram)
 {
 	u32 win_enable;
 	u32 win_protect;
@@ -2648,7 +2647,7 @@ mvneta_conf_mbus_windows(struct mvneta_port *pp,
 }
 
 /* Power up the port */
-static void __devinit mvneta_port_power_up(struct mvneta_port *pp, int phy_mode)
+static void mvneta_port_power_up(struct mvneta_port *pp, int phy_mode)
 {
 	u32 val;
 
@@ -2671,7 +2670,7 @@ static void __devinit mvneta_port_power_up(struct mvneta_port *pp, int phy_mode)
 }
 
 /* Device initialization routine */
-static int __devinit mvneta_probe(struct platform_device *pdev)
+static int mvneta_probe(struct platform_device *pdev)
 {
 	const struct mbus_dram_target_info *dram_target_info;
 	struct device_node *dn = pdev->dev.of_node;
@@ -2803,7 +2802,7 @@ err_free_netdev:
 }
 
 /* Device removal routine */
-static int __devexit mvneta_remove(struct platform_device *pdev)
+static int mvneta_remove(struct platform_device *pdev)
 {
 	struct net_device  *dev = platform_get_drvdata(pdev);
 	struct mvneta_port *pp = netdev_priv(dev);
@@ -2828,7 +2827,7 @@ MODULE_DEVICE_TABLE(of, mvneta_match);
 
 static struct platform_driver mvneta_driver = {
 	.probe = mvneta_probe,
-	.remove = __devexit_p(mvneta_remove),
+	.remove = mvneta_remove,
 	.driver = {
 		.name = MVNETA_DRIVER_NAME,
 		.of_match_table = mvneta_match,
diff --git a/drivers/net/wireless/rtlwifi/rtl8723ae/sw.c b/drivers/net/wireless/rtlwifi/rtl8723ae/sw.c
index 18b0bc5..bb7cc90 100644
--- a/drivers/net/wireless/rtlwifi/rtl8723ae/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8723ae/sw.c
@@ -341,7 +341,7 @@ static struct rtl_hal_cfg rtl8723ae_hal_cfg = {
 	.maps[RTL_RC_HT_RATEMCS15] = DESC92_RATEMCS15,
 };
 
-static struct pci_device_id rtl8723ae_pci_ids[] __devinitdata = {
+static struct pci_device_id rtl8723ae_pci_ids[] = {
 	{RTL_PCI_DEVICE(PCI_VENDOR_ID_REALTEK, 0x8723, rtl8723ae_hal_cfg)},
 	{},
 };

^ permalink raw reply related

* Re: TCP sequence number inference attack on Linux
From: Zhiyun Qian @ 2012-12-21 23:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1356129948.21834.8002.camel@edumazet-glaptop>

On Fri, Dec 21, 2012 at 5:45 PM, Eric Dumazet <erdnetdev@gmail.com> wrote:
> On Fri, 2012-12-21 at 14:49 -0500, Zhiyun Qian wrote:
>
>> If I am not mistaken, line 6142 in kernel v3.7.1 corresponds to
>> tcp_rcv_state_process(). According to the comments, "This function
>> implements the receiving procedure of RFC 793 for all states except
>> ESTABLISHED and TIME_WAIT." Are you referring to a different kernel
>> version?
>
> You are not mistaken, it seems code is too permissive.
>
> We should reject a frame without ACK flag while in ESTABLISHED state.
>
> Thats explicitly stated in RFC 973.
>
> Then we should make all possible safety checks before even sending a
> frame or changing socket variables.

I completely agree!

>
> (For instance the tests done in tcp_ack() should be done before calling
> tcp_validate_incoming())
>

It seems that it is not straightforward to simply move tcp_ack()
before tcp_validate_incoming() as tcp_ack() currently assumes the tcp
sequence number is already validated and it may adjust certain states
purely depending on the ACK number. I guess the solution is to extract
all safety checks out and put at the very beginning. The rest of the
code in tcp_validate_incoming() and tcp_ack() may still need to
perform the redundant checks since if some state changes are dependent
on the sequence number or ACK number (e.g., window update).

I'm willing to help on this. Perhaps I can draft an initial patch and
you can help take a look before I submit it?

> John Dykstra in commit 96e0bf4b5193d0 (tcp: Discard segments that ack
> data not yet sent) did a step into right direction, but missed this.
>
> Current code assumes the incoming frame is mostly legitimate.
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index a136925..2ea4937 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5551,7 +5551,7 @@ slow_path:
>                 return 0;
>
>  step5:
> -       if (th->ack && tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
> +       if (!th->ack || tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
>                 goto discard;
>
>         /* ts_recent update must be made after we are sure that the packet
>
>

Neat change. This should enforce the ACK flag and ACK number check for
every packet received in established state.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox