* Question regarding IP_TOS and SO_PRIORITY
From: Holger Freyther @ 2010-05-25 21:19 UTC (permalink / raw)
To: netdev
Hi all,
I have a minor question regarding the IP_TOS and SO_PRIORITY socket
options. I was reading the man page for ip(7) and it pointed me to
socket(7) to look at the SO_PRIORITY. My man page claims that setting
the SO_PRIORITY will also set the tos on the outgoing IP header.
The problem is that I couldn't see the TOS set in my outgoing packets
and I was not able to find the code propagating the sk_priority into the
tos. On the other hand I could find that setting the IP_TOS will also
set the sk_priority.
Is this a documentation bug? Or did it work once? Or is it working and
my code was just broken and I'm too blind to find it?
z.
^ permalink raw reply
* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Changli Gao @ 2010-05-25 22:52 UTC (permalink / raw)
To: Jozsef Kadlecsik
Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
netfilter-devel, netdev
In-Reply-To: <alpine.DEB.2.00.1005252100500.21791@blackhole.kfki.hu>
On Wed, May 26, 2010 at 3:03 AM, Jozsef Kadlecsik
<kadlec@blackhole.kfki.hu> wrote:
>>
>> Yea. Only MSS option is supported. But it is better than being DoSed.
>> And you can set a threshold for SYNPROXY with limit match, then there
>> isn't any difference if there isn't any SYN-flood attack.
>
> If I (have to) limit SYNPROXY, why shouldn't I better limit the SYN
> packets directly instead?
>
Without SYNPROXY, you have to drop the over limit SYN packets, and
maybe normal SYN packets are dropped.
--
Regards,
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net] Phonet: fix potential use-after-free in pep_sock_close()
From: David Miller @ 2010-05-25 23:08 UTC (permalink / raw)
To: remi.denis-courmont; +Cc: netdev
In-Reply-To: <1274796709-29988-1-git-send-email-remi.denis-courmont@nokia.com>
From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Date: Tue, 25 May 2010 17:11:49 +0300
> sk_common_release() might destroy our last reference to the socket.
> So an extra temporary reference is needed during cleanup.
>
> Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Patch applied, thank you.
^ permalink raw reply
* Re: [PATCH] hso: add support for new products
From: David Miller @ 2010-05-25 23:09 UTC (permalink / raw)
To: f.aben; +Cc: gregkh, linux-usb, netdev, j.dumon
In-Reply-To: <alpine.DEB.2.00.1005251402310.2667@filip-linux>
From: f.aben@option.com
Date: Tue, 25 May 2010 14:33:01 +0200 (CEST)
> This patch adds a few new product id's for the hso driver.
>
> Signed-off-by: Filip Aben <f.aben@option.com>
Applied, thanks.
^ permalink raw reply
* Re: Question about an assignment in handle_ing()
From: Herbert Xu @ 2010-05-25 23:13 UTC (permalink / raw)
To: jamal; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1274793216.3878.947.camel@bigi>
On Tue, May 25, 2010 at 09:13:36AM -0400, jamal wrote:
> On Tue, 2010-05-25 at 22:46 +1000, Herbert Xu wrote:
>
> > That's not very surprising as you're not checking whether the
> > skb is cloned in act_pedit.c:
>
> I meant the test "if (skb_cloned(skb))" failed in such cases;-> So you
> couldnt reliably use it.
> If it turns out it is unnecessary, what you describe is what i had in
> mind as well.
If it did happen like you said then it would be a serious bug
in our stack as everything else (including the TCP stack) relies
on this.
You can't just make up your own rules :)
> It is not the responsibility of the action to drop packets in a pipeline
> rather the responsibility is that of the caller (ref: rule #3 in
> Documentation/networking/tc-actions-env-rules.txt). What to do on a
> failure such as above is programmable by the action user/admin.
But how can the caller make that decision when you return exactly
the same value in the error case as the normal case?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH] r8169: Add counters tx_bytes and rx_bytes for ethtool
From: David Miller @ 2010-05-25 23:15 UTC (permalink / raw)
To: junchangwang; +Cc: romieu, netdev
In-Reply-To: <AANLkTimRwNqPYh1MgXPhh3hHep7Koc3OJCCroEj_scqg@mail.gmail.com>
From: Junchang Wang <junchangwang@gmail.com>
Date: Tue, 25 May 2010 22:19:46 +0800
> Traffic stats counters (rx_bytes and tx_bytes) in net_device are
> "unsigned long". On 32-bit systems, they wrap around every few
> minutes, giving out wrong answers to the amount of traffic. To get the
> right message, another available approach is "ethtool -S". However,
> r8169 didn't support those two counters so far.
>
> Add traffic counters tx_bytes and rx_bytes with 64-bit width for
> ethtool. On 32-bit systems, gcc treats each one as two 32-bit
> variables, making the increment not "atomic". But there is no sync
> issue since the updates to the counters are serialized by driver logic
> in any case. Results provided by ethtool maybe slightly biased if the
> read and update operations are interleaved. But the results are much
> better than the original ones that always fall into the range from 0
> to 4GiB.
>
> Signed-off-by: Junchang Wang <junchangwang@gmail.com>
I absolutely do not want to see drivers start doing this, so right
off the bat I am not going to apply this patch.
If the problem is that people want 64-bit counters available for core
statistics on 32-bit systems, we do not fix that problem by hacking
every single driver to provide them side-band via ethtool.
First of all, we now have "struct rtnl_link_stats64" in
linux/if_link.h, it's there to start combating this problem
generically, for every device, rather than the way you are trying
handle it only for one specific driver at a time.
So that's the area where you should start looking to solve these kinds
of problem.
^ permalink raw reply
* Re: [PATCH net-2.6] be2net: Bug fix to avoid disabling bottom half during firmware upgrade.
From: David Miller @ 2010-05-25 23:16 UTC (permalink / raw)
To: sarveshwarb; +Cc: netdev
In-Reply-To: <20100525081514.GA5695@serverengines.com>
From: Sarveshwar Bandi <sarveshwarb@serverengines.com>
Date: Tue, 25 May 2010 13:45:24 +0530
> Certain firmware commands/operations to upgrade firmware could take several
> seconds to complete. The code presently disables bottom half during these
> operations which could lead to unpredictable behaviour in certain cases. This
> patch now does all firmware upgrade operations asynchronously using a
> completion variable.
>
> Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
Applied, thanks.
^ permalink raw reply
* Re: linux-next: build warning in Linus' tree
From: David Miller @ 2010-05-25 23:19 UTC (permalink / raw)
To: sfr; +Cc: netdev, linux-next, linux-kernel, NeilJay
In-Reply-To: <20100524.215822.13254177.davem@davemloft.net>
From: David Miller <davem@davemloft.net>
Date: Mon, 24 May 2010 21:58:22 -0700 (PDT)
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Tue, 25 May 2010 11:46:14 +1000
>
>> Hi Dave,
>>
>> Today's linux-next build (x86_64 allmodconfig) produced this warning:
>>
>> drivers/net/usb/asix.c: In function 'asix_rx_fixup':
>> drivers/net/usb/asix.c:325: warning: cast from pointer to integer of different size
>> drivers/net/usb/asix.c:354: warning: cast from pointer to integer of different size
>>
>> Introduced by commit 3f78d1f210ff89af77f042ab7f4a8fee39feb1c9
>> ("drivers/net/usb/asix.c: Fix unaligned accesses"). This commit casts
>> skb->data to u32.
>
> Thanks I'll look into this.
Here is how I fixed this:
--------------------
drivers/net/usb/asix.c: Fix pointer cast.
Stephen Rothwell reports the following new warning:
drivers/net/usb/asix.c: In function 'asix_rx_fixup':
drivers/net/usb/asix.c:325: warning: cast from pointer to integer of different size
drivers/net/usb/asix.c:354: warning: cast from pointer to integer of different size
The code just cares about the low alignment bits, so use
an "unsigned long" cast instead of one to "u32".
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/usb/asix.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index 31b7331..ea75f47 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -322,7 +322,7 @@ static int asix_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
size = (u16) (header & 0x0000ffff);
if ((skb->len) - ((size + 1) & 0xfffe) == 0) {
- u8 alignment = (u32)skb->data & 0x3;
+ u8 alignment = (unsigned long)skb->data & 0x3;
if (alignment != 0x2) {
/*
* not 16bit aligned so use the room provided by
--
1.7.0.4
^ permalink raw reply related
* Re: linux-next: build warning in Linus' tree
From: David Miller @ 2010-05-25 23:24 UTC (permalink / raw)
To: sfr; +Cc: netdev, linux-next, linux-kernel, NeilJay
In-Reply-To: <20100525.161929.112591425.davem@davemloft.net>
From: David Miller <davem@davemloft.net>
Date: Tue, 25 May 2010 16:19:29 -0700 (PDT)
> Here is how I fixed this:
>
> --------------------
> drivers/net/usb/asix.c: Fix pointer cast.
Sorry, that only took care of one of the two warnings :-)
This patch is better.
--------------------
drivers/net/usb/asix.c: Fix pointer cast.
Stephen Rothwell reports the following new warning:
drivers/net/usb/asix.c: In function 'asix_rx_fixup':
drivers/net/usb/asix.c:325: warning: cast from pointer to integer of different size
drivers/net/usb/asix.c:354: warning: cast from pointer to integer of different size
The code just cares about the low alignment bits, so use
an "unsigned long" cast instead of one to "u32".
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/usb/asix.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index 31b7331..1f802e9 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -322,7 +322,7 @@ static int asix_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
size = (u16) (header & 0x0000ffff);
if ((skb->len) - ((size + 1) & 0xfffe) == 0) {
- u8 alignment = (u32)skb->data & 0x3;
+ u8 alignment = (unsigned long)skb->data & 0x3;
if (alignment != 0x2) {
/*
* not 16bit aligned so use the room provided by
@@ -351,7 +351,7 @@ static int asix_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
}
ax_skb = skb_clone(skb, GFP_ATOMIC);
if (ax_skb) {
- u8 alignment = (u32)packet & 0x3;
+ u8 alignment = (unsigned long)packet & 0x3;
ax_skb->len = size;
if (alignment != 0x2) {
--
1.7.0.4
^ permalink raw reply related
* [GIT] Networking
From: David Miller @ 2010-05-25 23:59 UTC (permalink / raw)
To: torvalds; +Cc: akpm, netdev, linux-kernel
1) dev_get_valid_name can create conditions wherein it becomes impossible
to rename a device, fixed by Daniel Lezcano.
2) CAIF protocol bug fixes from Sjur Braendeland:
a) wait_ev*_timeout return value accidently stored in 'int' instead
of 'long'
b) By-hand list implementation buggy, use standard kernel lists.
c) Memory allocation failures not handled correctly.
d) poll() erroneously returns POLLHUP when connecting
e) missing spin_unlock in cfmuxl_remove_uplayer()
f) receive needs to set MSG_TRUNC when user buf size is insufficient
3) Cleanup accidently introduced uninitialized variable in pppoe_flush_dev().
Fix from Dan Carpenter.
4) ISDN fixes from Tilman Schmidt.
a) ->reset_ctr() op is marked optional, but code doesn't actually
check for NULL
b) Dummy stubs for ->reset_ctr() and ->load_firmware() are erroneous
and cause hangs, delete.
5) Blackfin SIR IRDA device is not harmed by the UART unit bug, so don't
adjust clock values using the bug workaround adjustment. From
Graf Yang.
6) Networking control group is severely limited because it tries to sample
the task control group at the time of packet transmission. This really
can't be done reliably, especially when packets are sent async from
another context.
Fix by storing the cgroupid in the socket, sampling it at I/O call
points (sendmsg, recvmsg, splice_read, sendpage), then use this
value when we classify via the skb->sk socket.
Handle special cases like TUN (which call netif_rx() to inject packets)
directly.
All of this work done by Herbert Xu.
7) DCCP stores error codes in a u8, which is insufficient for MIPS which
needs a u16, fix from Yoichi Yuasa.
8) BE2NET bug fixes from Sarveshwar Bandi:
a) Must PHY reset after FW init.
b) FW upgrade can take a while, use completions instead of polling
with softirqs disabled.
9) proc_dointvec change that came in via the networking tree for port
range specifications introduced a regression for procfs files, accidently
disallowing write("1\n"). Fix from J. R. Okajima.
10) Use after free in Phone socket close, fix from Rémi Denis-Courmont.
11) Memory leak in macvlan, from Jiri Pirko.
12) Initial GRO fraglist element's ->gso_size can be bogus, if packet
hits device not TSO capable but fraglist capable. Fix from Herbert
Xu.
13) Add support for some new IXGBE device IDs.
14) Memory leak and alloc failure fixes from Denis Kirjaniv.
a) sh_mdio_release() leaks memory
b) check for kzalloc() return value in ieee802154.
c) bfin_mac leaks miibus->irq memory
15) sja1000.c missing spin_lock_init() on priv->cmdreg_lock, fix from
Oliver Hartkopp.
16) The batched dequeuing of input_pkt_queue introduced this merge window
added some problems. Inaccurate packet counting can occur which defeats
the schemes that RFS uses to prevent out-of-order packet processing when
we want to modify the the target RX cpu for a flow.
Fix from Tom Herbert.
17) Several wireless fixes via John Linville and the wireless crew.
18) ENIC bug fixes from Scott Feldman.
a) Fix UUID fmt'ing array type.
b) port-profile assosciation happens before we have a MAC address
assigned, use a random one instead of garbage
19) qdisc_notify() can OOPS, because tc_fill_qdisc() can accidently be
called for a builtin qdisc. Fix from Eric Dumazet.
20) do_setlink() had some error handling bugs introduced this cycle
(failure to initialize 'err' in some error paths), fix from David
Howells.
21) ethoc_probe can deref a NULL pointer, fix from Thomas Chou.
22) netif_vdbg() definition is garbage when VERBOSE_DEBUG is defined.
Fix from Ben Hutchings.
23) Netfilter fixes via Patrick McHardy:
a) nf_ct_sip doesn't handle nonlinear packets, but it needs to
b) __nf_conntrack_confirm() races with nf_ct_get_next_corpse()
24) Fix race in i2400m_rx_edata() resulting in an OOPS on i2400m->rx_roq,
from Inaky Perez-Gonzalez.
Please pull, thanks a lot!
The following changes since commit ec96e2fe954c23a54bfdf2673437a39e193a1822:
Linus Torvalds (1):
Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm
are available in the git repository at:
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master
Baruch Siach (1):
fec: add support for PHY interface platform data
Ben Hutchings (1):
net: Fix definition of netif_vdbg() when VERBOSE_DEBUG is not defined
Bruno Randolf (1):
ath5k: consistently use rx_bufsize for RX DMA
Dan Carpenter (4):
pppoe: uninitialized variable in pppoe_flush_dev()
ath9k_htc: dereferencing before check in hif_usb_tx_cb()
ath9k_htc: rare leak in ath9k_hif_usb_alloc_tx_urbs()
iwlwifi: testing the wrong variable in iwl_add_bssid_station()
Daniel Lezcano (1):
net-2.6 : V2 - fix dev_get_valid_name
David Howells (1):
rtnetlink: Fix error handling in do_setlink()
David S. Miller (4):
Merge branch 'master' of git://git.kernel.org/.../kaber/nf-next-2.6
Merge branch 'master' of git://git.kernel.org/.../linville/wireless-2.6
Merge branch 'wimax-2.6.35.y' of git://git.kernel.org/.../inaky/wimax
drivers/net/usb/asix.c: Fix pointer cast.
Denis Kirjanov (3):
sh_eth: Fix memleak in sh_mdio_release
ieee802154: Fix possible NULL pointer dereference in wpan_phy_alloc
bfin_mac: fix memleak in mii_bus{probe|remove}
Eric Dumazet (1):
net_sched: Fix qdisc_notify()
Felix Fietkau (3):
cfg80211: fix crash in cfg80211_set_freq()
ath9k: change beacon allocation to prefer the first beacon slot
ath9k: remove VEOL support for ad-hoc
Filip Aben (1):
hso: add support for new products
Gertjan van Wingerde (2):
rt2x00: Fix failed SLEEP->AWAKE and AWAKE->SLEEP transitions.
rt2x00: Fix rt2800usb TX descriptor writing.
Graf Yang (1):
net/irda: bfin_sir: IRDA is not affected by anomaly 05000230
Helmut Schaa (1):
rt2x00: don't use to_pci_dev in rt2x00pci_uninitialize
Herbert Xu (4):
gro: Fix bogus gso_size on the first fraglist entry
cls_cgroup: Store classid in struct sock
tun: Update classid on packet injection
cls_cgroup: Fix build error when built-in
Inaky Perez-Gonzalez (1):
wimax/i2400m: fix bad race condition check in RX path
J. R. Okajima (1):
proc_dointvec: write a single value
Jiri Pirko (1):
macvlan: do proper cleanup in macvlan_common_newlink() V2
Joerg Marx (1):
netfilter: nf_conntrack: fix a race in __nf_conntrack_confirm against nf_ct_get_next_corpse()
Johannes Berg (1):
cfg80211: add missing braces
John W. Linville (1):
Revert "ath9k: Group Key fix for VAPs"
Jussi Kivilinna (1):
rndis_wlan: replace wireless_send_event with cfg80211_disconnected
Juuso Oikarinen (1):
wl1271: Fix RX data path frame lengths
Luciano Coelho (1):
netfilter: fix description of expected checkentry return code on xt_target
Luis R. Rodriguez (1):
ath9k: remove AR9003 from PCI IDs for now
Mallikarjuna R Chilakala (1):
ixgbe:add support for a new 82599 10G Base-T device
Mike Frysinger (1):
net-caif: drop redundant Kconfig entries
Oliver Hartkopp (1):
can: SJA1000 add missing spin_lock_init()
Patrick McHardy (1):
netfilter: nf_ct_sip: handle non-linear skbs
Randy Dunlap (3):
wireless: fix mac80211.h kernel-doc warnings
wireless: fix sta_info.h kernel-doc warnings
sock.h: fix kernel-doc warning
Reinette Chatre (1):
iwlwifi: fix internal scan race
Rémi Denis-Courmont (1):
Phonet: fix potential use-after-free in pep_sock_close()
Sarveshwar Bandi (2):
be2net: Bug fix in init code in probe
be2net: Bug fix to avoid disabling bottom half during firmware upgrade.
Scott Feldman (2):
enic: bug fix: sprintf UUID to string as u8[] rather than u16[] array
enic: Use random mac addr when associating port-profile
Sjur Braendeland (6):
caif: Bugfix - wait_ev*_timeout returns long.
caif: Bugfix - use standard Linux lists
caif: Bugfix - handle mem-allocation failures
caif: Bugfix - Poll can't return POLLHUP while connecting.
caif: Bugfix - missing spin_unlock
caif: Bugfix - use MSG_TRUNC in receive
Sujith (1):
cfg80211: Fix signal_type comparison
Tejun Heo (1):
wireless: update gfp/slab.h includes
Thomas Chou (1):
ethoc: fix null dereference in ethoc_probe
Tilman Schmidt (2):
isdn/capi: make reset_ctr op truly optional
isdn/gigaset: remove dummy CAPI method implementations
Tom Herbert (1):
net: fix problem in dequeuing from input_pkt_queue
Vasanthakumar Thiagarajan (1):
ath9k: Fix rx of mcast/bcast frames in PS mode with auto sleep
Yoichi Yuasa (1):
net/dccp: expansion of error code size
drivers/isdn/capi/kcapi.c | 6 ++
drivers/isdn/gigaset/capi.c | 28 +--------
drivers/net/benet/be.h | 2 +
drivers/net/benet/be_cmds.c | 19 +++++-
drivers/net/benet/be_main.c | 11 ++-
drivers/net/bfin_mac.c | 2 +
drivers/net/can/sja1000/sja1000.c | 2 +
drivers/net/enic/enic_main.c | 29 ++++++---
drivers/net/ethoc.c | 34 +++++++++-
drivers/net/fec.c | 22 +++++++
drivers/net/fec.h | 2 +
drivers/net/irda/bfin_sir.c | 8 ++-
drivers/net/ixgbe/ixgbe.h | 3 +
drivers/net/ixgbe/ixgbe_82598.c | 1 +
drivers/net/ixgbe/ixgbe_82599.c | 1 +
drivers/net/ixgbe/ixgbe_main.c | 69 +++++++++++++++++++++
drivers/net/ixgbe/ixgbe_phy.c | 30 +++++++++
drivers/net/ixgbe/ixgbe_phy.h | 3 +
drivers/net/ixgbe/ixgbe_type.h | 4 +
drivers/net/macvlan.c | 9 +++-
drivers/net/pppoe.c | 1 +
drivers/net/sh_eth.c | 3 +
drivers/net/tun.c | 2 +
drivers/net/usb/asix.c | 4 +-
drivers/net/usb/hso.c | 3 +
drivers/net/wimax/i2400m/rx.c | 4 +-
drivers/net/wireless/ath/ath5k/base.c | 7 +-
drivers/net/wireless/ath/ath9k/beacon.c | 75 ++++-------------------
drivers/net/wireless/ath/ath9k/hif_usb.c | 10 +++-
drivers/net/wireless/ath/ath9k/htc.h | 1 +
drivers/net/wireless/ath/ath9k/main.c | 28 +--------
drivers/net/wireless/ath/ath9k/pci.c | 1 -
drivers/net/wireless/ath/ath9k/recv.c | 17 ++++--
drivers/net/wireless/iwlwifi/iwl-agn-ict.c | 1 +
drivers/net/wireless/iwlwifi/iwl-scan.c | 21 ++++++-
drivers/net/wireless/iwlwifi/iwl-sta.c | 2 +-
drivers/net/wireless/rndis_wlan.c | 16 +++--
drivers/net/wireless/rt2x00/rt2400pci.c | 9 ++-
drivers/net/wireless/rt2x00/rt2500pci.c | 9 ++-
drivers/net/wireless/rt2x00/rt2800usb.c | 2 +-
drivers/net/wireless/rt2x00/rt2x00pci.c | 2 +-
drivers/net/wireless/rt2x00/rt61pci.c | 7 +-
drivers/net/wireless/rt2x00/rt73usb.c | 7 +-
drivers/net/wireless/wl12xx/wl1271_rx.c | 2 +
include/linux/fec.h | 21 ++++++
include/linux/netdevice.h | 16 ++++-
include/linux/netfilter/x_tables.h | 2 +-
include/net/caif/cfctrl.h | 4 +-
include/net/cls_cgroup.h | 63 +++++++++++++++++++
include/net/mac80211.h | 4 +-
include/net/netfilter/nf_conntrack_core.h | 2 +-
include/net/sock.h | 12 +++-
kernel/sysctl.c | 4 +-
net/caif/Kconfig | 5 +-
net/caif/caif_socket.c | 91 +++++++++++-----------------
net/caif/cfctrl.c | 92 ++++++++--------------------
net/caif/cfmuxl.c | 3 +-
net/caif/cfpkt_skbuff.c | 25 +++++---
net/caif/cfserl.c | 3 +-
net/caif/cfsrvl.c | 6 ++
net/core/dev.c | 48 ++++++++------
net/core/rtnetlink.c | 4 +-
net/core/skbuff.c | 1 +
net/core/sock.c | 19 ++++++
net/dccp/input.c | 6 +-
net/ieee802154/wpan-class.c | 7 ++-
net/mac80211/key.c | 1 -
net/mac80211/sta_info.h | 2 +-
net/netfilter/nf_conntrack_core.c | 10 +++
net/netfilter/nf_conntrack_sip.c | 12 +---
net/phonet/pep.c | 2 +
net/sched/cls_cgroup.c | 50 ++++++++++-----
net/sched/sch_api.c | 14 ++--
net/socket.c | 9 +++
net/wireless/chan.c | 2 +-
net/wireless/nl80211.c | 6 +-
net/wireless/scan.c | 4 +-
77 files changed, 682 insertions(+), 387 deletions(-)
create mode 100644 include/linux/fec.h
create mode 100644 include/net/cls_cgroup.h
^ permalink raw reply
* ethtool 2.6.34 released
From: Jeff Garzik @ 2010-05-26 0:07 UTC (permalink / raw)
To: NetDev
ethtool version 2.6.34 has been released.
Home page: https://sourceforge.net/projects/gkernel/
Download link:
https://sourceforge.net/projects/gkernel/files/ethtool/2.6.34/ethtool-2.6.34.tar.gz/download
Release notes:
* Feature: Support n-tuple filter programming
* Feature: Support rx hashing, v2 (targetted for 2.6.35)
* Feature: Add names of newer Marvell chips
^ permalink raw reply
* Re: [PATCH] r8169: Add counters tx_bytes and rx_bytes for ethtool
From: Junchang Wang @ 2010-05-26 0:51 UTC (permalink / raw)
To: David Miller; +Cc: romieu, netdev
In-Reply-To: <20100525.161539.104072714.davem@davemloft.net>
Hi David,
Thanks for your advice.
>
> If the problem is that people want 64-bit counters available for core
> statistics on 32-bit systems, we do not fix that problem by hacking
> every single driver to provide them side-band via ethtool.
Most NICs have provided those two 64-bit counters in hardware. They
work fine even in 32-bit systems and don't need new 64-bit counters
any more. Frankly, r8169 is the first Gbps NIC I have seen that does
not support those two counters. So I thought changing upper layer is
immoderate and tried to provide a cheap but valuable way.
>
> First of all, we now have "struct rtnl_link_stats64" in
> linux/if_link.h, it's there to start combating this problem
> generically, for every device, rather than the way you are trying
> handle it only for one specific driver at a time.
Thanks for your advice. I'll go deep into it and see how we can solve
this problem.
--
--Junchang
^ permalink raw reply
* Re: [PATCH] r8169: Add counters tx_bytes and rx_bytes for ethtool
From: Junchang Wang @ 2010-05-26 1:01 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev
In-Reply-To: <20100525195612.GA3344@electric-eye.fr.zoreil.com>
Hi Francois,
>
> If the packets are short enough, replace "_bytes" by "_packets", "_minutes"
> by "_hours" or "_every_day" and the same kind of problem appear.
r8169 has provided 64-bit hardware counters for #packets,
#error_packets, etc. They works fine even on 32-bit systems. What we
really need is just counter rx_bytes and tx_bytes.
>
> You can fix the application at zero cost in the kernel: poll < 34 s and
> update the application counters with the kernel counters increment.
Thanks for you advice.
--
--Junchang
^ permalink raw reply
* [PATCH] ethtool: Fix list of hash options in manual page
From: Ben Hutchings @ 2010-05-26 1:15 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Aníbal Monsalve Salazar, netdev
'p' is not a valid option.
The 'm' option was missing a preceding 'B' for bold.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
--- ethtool-2.6.34.orig/ethtool.8
+++ ethtool-2.6.34/ethtool.8
@@ -47,7 +47,7 @@
.\"
.\" \(*HO - hash options
.\"
-.ds HO \fBp\fP|\fm\fP|\fBv\fP|\fBt\fP|\fBs\fP|\fBd\fP|\fBf\fP|\fBn\fP|\fBr\fP...
+.ds HO \fBm\fP|\fBv\fP|\fBt\fP|\fBs\fP|\fBd\fP|\fBf\fP|\fBn\fP|\fBr\fP...
.TH ETHTOOL 8 "July 2007" "Ethtool version 6"
.SH NAME
ethtool \- Display or change ethernet card settings
--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
^ permalink raw reply
* linux-next: build warning in Linus' tree
From: Stephen Rothwell @ 2010-05-26 1:43 UTC (permalink / raw)
To: David Miller, netdev; +Cc: linux-next, linux-kernel, Linus, Herbert Xu
[-- Attachment #1: Type: text/plain, Size: 586 bytes --]
Hi Dave,
Today's linux-next build (x86_64 allmodconfig) produced this warning:
net/core/sock.c: In function 'sock_update_classid':
include/net/cls_cgroup.h:42: warning: 'classid' may be used uninitialized in this function
include/net/cls_cgroup.h:42: note: 'classid' was declared here
In the case that rcu_dereference() returns a value < 0, classid will not
be assigned in task_cls_classid(). I don't know if this is possible - if
not, then why is the test there?
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* RE: NULL Pointer Deference: NFS & Telnet
From: Arce, Abraham @ 2010-05-26 1:48 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, David Miller
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-omap-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tony Lindgren,
Shilimkar, Santosh
In-Reply-To: <27F9C60D11D683428E133F85D2BB4A53043E33A997-lTKHBJngVwKIQmiDNMet8wC/G2K4zDHf@public.gmane.org>
Hi,
I am able to avoid the NULL pointer dereference but not sure if the handling
is the correct one... find the patch below...
> I have 2 scenarios in which I am getting a NULL pointer dereference:
>
> 1) root filesystem over nfs
> 2) telnet connection
>
> The issue appeared on this commit
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
> 2.6.git;a=commit;h=f8965467f366fd18f01feafb5db10512d7b4422c
>
> The driver I am working with is drivers/net/ks8851.c
> Any help will be highly appreciated...
>
> ---
>
> Scenario 1 | root filesystem over nfs
>
> Looking up port of RPC 100005/1 on 10.87.231.229
> VFS: Mounted root (nfs filesystem) on device 0:10.
> Freeing init memory: 128K
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [..]
> PC is at put_page+0xc/0x120
> LR is at skb_release_data+0x74/0xb8
> [..]
> Backtrace:
> [<c0086dd0>] (put_page+0x0/0x120)
> [<c01d0a48>] (skb_release_data+0x0/0xb8)
> [<c01d1044>] (skb_release_all+0x0/0x20)
> [<c01d075c>] (__kfree_skb+0x0/0xbc)
> [<c01d0818>] (consume_skb+0x0/0x58)
> [<c01d39cc>] (skb_free_datagram+0x0/0x40)
> [<c023ff74>] (xs_udp_data_ready+0x0/0x1e8)
> [<c01ce034>] (sock_queue_rcv_skb+0x0/0x1c0)
> [<c01fbba8>] (ip_queue_rcv_skb+0x0/0x58)
> [<c02176c0>] (__udp_queue_rcv_skb+0x0/0x18c)
> [<c0218e28>] (udp_queue_rcv_skb+0x0/0x348)
> [<c02195a4>] (__udp4_lib_rcv+0x0/0x564)
> [<c0219b08>] (udp_rcv+0x0/0x20)
> [<c01f5f34>] (ip_local_deliver+0x0/0x264)
> [<c01f586c>] (ip_rcv+0x0/0x6c8)
> [<c01d7ec0>] (__netif_receive_skb+0x0/0x2d0)
> [<c01d8190>] (process_backlog+0x0/0x16c)
> [<c01d8e14>] (net_rx_action+0x0/0x18c)
> [<c00521a0>] (__do_softirq+0x0/0x12c)
> [<c00522cc>] (irq_exit+0x0/0x70)
> [<c0028000>] (asm_do_IRQ+0x0/0xc8)
>
> Complete log at http://pastebin.mozilla.org/728027
>
> ---
>
> Scenario 2
>
> 1. Root filesystem booted in ram
> 2. eth0 brought up
> 3. telnetd daemon started
> 4. tried to connect through telnet
>
> # Unable to handle kernel NULL pointer dereference at virtual address 00000000
> pgd = d98e8000
> [..]
> PC is at put_page+0xc/0x120
> LR is at skb_release_data+0x74/0xb8
> [..]
> Backtrace:
> [<c0086dd0>] (put_page+0x0/0x120)
> [<c01d0a48>] (skb_release_data+0x0/0xb8)
> [<c01d1044>] (skb_release_all+0x0/0x20)
> [<c01d075c>] (__kfree_skb+0x0/0xbc)
> [<c0202444>] (tcp_recvmsg+0x0/0x93c)
> [<c02201e8>] (inet_recvmsg+0x0/0xec)
> [<c01c7fd0>] (sock_aio_read+0x0/0xf8)
> [<c00ab3ac>] (do_sync_read+0x0/0xec)
> [<c00abfbc>] (vfs_read+0x0/0x164)
> [<c00ac1a0>] (sys_read+0x0/0x70)
> [<c0029100>] (ret_fast_syscall+0x0/0x30)
>
> Complete log at http://pastebin.mozilla.org/728028
>
Check for NULL data in sk_buff before sending to put_page
Signed-off-by: Abraham Arce <x0066660-l0cyMroinI0@public.gmane.org>
---
net/core/skbuff.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f8abf68..eb81f76 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -334,7 +334,7 @@ static void skb_release_data(struct sk_buff *skb)
if (!skb->cloned ||
!atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
&skb_shinfo(skb)->dataref)) {
- if (skb_shinfo(skb)->nr_frags) {
+ if (skb_shinfo(skb)->nr_frags && skb_has_frags(skb)) {
int i;
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
put_page(skb_shinfo(skb)->frags[i].page);
--
1.5.4.3
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: linux-next: build warning in Linus' tree
From: Herbert Xu @ 2010-05-26 1:51 UTC (permalink / raw)
To: Stephen Rothwell; +Cc: David Miller, netdev, linux-next, linux-kernel, Linus
In-Reply-To: <20100526114306.5fa6fa0e.sfr@canb.auug.org.au>
On Wed, May 26, 2010 at 11:43:06AM +1000, Stephen Rothwell wrote:
> Hi Dave,
>
> Today's linux-next build (x86_64 allmodconfig) produced this warning:
>
> net/core/sock.c: In function 'sock_update_classid':
> include/net/cls_cgroup.h:42: warning: 'classid' may be used uninitialized in this function
> include/net/cls_cgroup.h:42: note: 'classid' was declared here
>
> In the case that rcu_dereference() returns a value < 0, classid will not
> be assigned in task_cls_classid(). I don't know if this is possible - if
> not, then why is the test there?
This is a genuine bug. I don't know why my gcc didn't warn about
it.
cls_cgroup: Initialise classid when module is absent
When the cls_cgroup module is not loaded, task_cls_classid will
return an uninitialised classid instead of zero.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
index 6cf4486..726cc35 100644
--- a/include/net/cls_cgroup.h
+++ b/include/net/cls_cgroup.h
@@ -39,7 +39,7 @@ extern int net_cls_subsys_id;
static inline u32 task_cls_classid(struct task_struct *p)
{
int id;
- u32 classid;
+ u32 classid = 0;
if (in_interrupt())
return 0;
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related
* Re: NULL Pointer Deference: NFS & Telnet
From: David Miller @ 2010-05-26 1:52 UTC (permalink / raw)
To: x0066660-l0cyMroinI0
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA,
linux-omap-u79uwXL29TY76Z2rM5mHXA, tony-4v6yS6AI5VpBDgjK7y7TUQ,
santosh.shilimkar-l0cyMroinI0
In-Reply-To: <27F9C60D11D683428E133F85D2BB4A53043E3EDFE6-lTKHBJngVwKIQmiDNMet8wC/G2K4zDHf@public.gmane.org>
From: "Arce, Abraham" <x0066660-l0cyMroinI0@public.gmane.org>
Date: Tue, 25 May 2010 20:48:02 -0500
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index f8abf68..eb81f76 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -334,7 +334,7 @@ static void skb_release_data(struct sk_buff *skb)
> if (!skb->cloned ||
> !atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
> &skb_shinfo(skb)->dataref)) {
> - if (skb_shinfo(skb)->nr_frags) {
> + if (skb_shinfo(skb)->nr_frags && skb_has_frags(skb)) {
> int i;
> for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
> put_page(skb_shinfo(skb)->frags[i].page);
skb_shinfo(skb)->nr_frags counts the number of entries contained
in the skb_shinfo(skb)->frags[] array.
This has nothing to do with the frag list pointer,
skb_shinfo(skb)->frag_list, which is what skb_has_frags()
tests.
You've got some kind of memory corruption going on and it
appears to have nothing to do with the code paths you're
playing with here.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: linux-next: build warning in Linus' tree
From: David Miller @ 2010-05-26 1:54 UTC (permalink / raw)
To: herbert; +Cc: sfr, netdev, linux-next, linux-kernel, torvalds
In-Reply-To: <20100526015110.GA21587@gondor.apana.org.au>
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 26 May 2010 11:51:10 +1000
> cls_cgroup: Initialise classid when module is absent
>
> When the cls_cgroup module is not loaded, task_cls_classid will
> return an uninitialised classid instead of zero.
>
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Applied, thanks Herbert.
^ permalink raw reply
* RE: NULL Pointer Deference: NFS & Telnet
From: Arce, Abraham @ 2010-05-26 2:02 UTC (permalink / raw)
To: David Miller
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-omap-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
tony-4v6yS6AI5VpBDgjK7y7TUQ@public.gmane.org, Shilimkar, Santosh
In-Reply-To: <20100525.185236.193707791.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Thanks David,
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index f8abf68..eb81f76 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -334,7 +334,7 @@ static void skb_release_data(struct sk_buff *skb)
> > if (!skb->cloned ||
> > !atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
> > &skb_shinfo(skb)->dataref)) {
> > - if (skb_shinfo(skb)->nr_frags) {
> > + if (skb_shinfo(skb)->nr_frags && skb_has_frags(skb)) {
> > int i;
> > for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
> > put_page(skb_shinfo(skb)->frags[i].page);
>
> skb_shinfo(skb)->nr_frags counts the number of entries contained
> in the skb_shinfo(skb)->frags[] array.
>
> This has nothing to do with the frag list pointer,
> skb_shinfo(skb)->frag_list, which is what skb_has_frags()
> tests.
>
> You've got some kind of memory corruption going on and it
> appears to have nothing to do with the code paths you're
> playing with here.
Do you have any recommendation on debugging technique/tool for this memory corruption issue?
Best Regards
Abraham
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: linux-next: build warning in Linus' tree
From: Stephen Rothwell @ 2010-05-26 2:04 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-next, linux-kernel, NeilJay
In-Reply-To: <20100525.162446.183058933.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 530 bytes --]
Hi Dave,
On Tue, 25 May 2010 16:24:46 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
>
> From: David Miller <davem@davemloft.net>
> Date: Tue, 25 May 2010 16:19:29 -0700 (PDT)
>
> > Here is how I fixed this:
> >
> > --------------------
> > drivers/net/usb/asix.c: Fix pointer cast.
>
> Sorry, that only took care of one of the two warnings :-)
>
> This patch is better.
Thanks, looks good.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* [RFC] IFLA_PORT_* iproute2 cmd line
From: Scott Feldman @ 2010-05-26 3:19 UTC (permalink / raw)
To: netdev; +Cc: Chris Wright, Stephen Hemminger, Arnd Bergmann
I need to provide an iproute2 patch for IFLA_PORT_* and I wanted to hash out
the cmd line before I submit it. Here's what I think would work based on
previous input from Arnd:
Usage: ip port associate DEVICE [ vf NUM ] {PROFILE|VSI}
ip port pre-associate DEVICE [ vf NUM ] VSI
ip port pre-associate-rr DEVICE [ vf NUM ] VSI
ip port dis-associate DEVICE [ vf NUM ]
ip port show [ DEVICE [ vf NUM ] ]
PROFILE := port-profile PORT-PROFILE
[ instance-uuid INSTANCE-UUID ]
[ host-uuid HOST-UUID ]
VSI := vsi managerid MGR typeid VTID typeidversion VER
[ instance-uuid INSTANCE-UUID ]
Comments?
-scott
^ permalink raw reply
* Re: Warning in net/ipv4/af_inet.c:154
From: Anton Blanchard @ 2010-05-26 3:19 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1274801229.5020.80.camel@edumazet-laptop>
Hi,
> > Which is:
> >
> > WARN_ON(sk->sk_forward_alloc);
> >
>
> Yes, the infamous one :)
>
> Is it reproductible ? What kind of workload is it ?
> What is the NIC involved ?
It was running sysbench against a postgresql database over localhost. In
each case I checked, sk_forward_alloc was less than one page.
I notice we update sk_forward_alloc in sk_mem_charge and sk_mem_uncharge.
Since it isn't an atomic variable I went looking for a lock somewhere in
the call chain (first thought was the socket lock). I couldn't find
anything, but I could easily be missing something.
Anton
^ permalink raw reply
* [PATCH] tcp: Socket option to set congestion window
From: Tom Herbert @ 2010-05-26 5:01 UTC (permalink / raw)
To: davem; +Cc: netdev, ycheng
This patch allows an application to set the TCP congestion window
for a connection through a socket option. The maximum value that
may set is specified in a sysctl value. When the sysctl is set to
zero, the default value, the socket option is disabled.
The socket option is most useful to set the initial congestion
window for a connection to a larger value than the default in
order to improve latency. This socket option would typically be
used by an "intelligent" application which might have better knowledge
than the kernel as to what an appropriate initial congestion window is.
One use of this might be with an application which maintains per
client path characteristics. This could allow setting the congestion
window more precisely than which could be achieved through the
route command.
A second use of this might be to reduce the number of simultaneous
connections that a client might open to the server; for instance
when a web browser opens multiple connections to a server. With multiple
connections the aggregate congestion window is larger than that of a
single connecton (num_conns * cwnd), this effectively can be used to
circumvent slowstart and improve latency. With this socket option, a
single connection with a large initial congestion window could be used,
which retains the latency properties of multiple connections but
nicely reducing # of connections (load) on the network.
The systctl to enable and control this feature is
net.ipv4.tcp_user_cwnd_max
The socket option call would be:
setsockopt(fd, IPPROTO_TCP, TCP_CWND, &val, sizeof (val))
where val is the congestion window in # MSS.
Signed-off-by: Tom Herbert <therbert@google.com>
---
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index a778ee0..9e9692f 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -105,6 +105,7 @@ enum {
#define TCP_COOKIE_TRANSACTIONS 15 /* TCP Cookie Transactions */
#define TCP_THIN_LINEAR_TIMEOUTS 16 /* Use linear timeouts for thin streams*/
#define TCP_THIN_DUPACK 17 /* Fast retrans. after 1 dupack */
+#define TCP_CWND 18 /* Set congestion window */
/* for TCP_INFO socket option */
#define TCPI_OPT_TIMESTAMPS 1
diff --git a/include/net/tcp.h b/include/net/tcp.h
index a144914..3d1f934 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -246,6 +246,7 @@ extern int sysctl_tcp_max_ssthresh;
extern int sysctl_tcp_cookie_size;
extern int sysctl_tcp_thin_linear_timeouts;
extern int sysctl_tcp_thin_dupack;
+extern int sysctl_tcp_user_cwnd_max;
extern atomic_t tcp_memory_allocated;
extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index d96c1da..b35d18f 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -597,6 +597,13 @@ static struct ctl_table ipv4_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec
},
+ {
+ .procname = "tcp_user_cwnd_max",
+ .data = &sysctl_tcp_user_cwnd_max,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
{
.procname = "udp_mem",
.data = &sysctl_udp_mem,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6596b4f..0ca9832 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2370,6 +2370,24 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
}
break;
+ case TCP_CWND:
+ if (sysctl_tcp_user_cwnd_max <= 0)
+ err = -EPERM;
+ else if (val > 0 && sk->sk_state == TCP_ESTABLISHED &&
+ icsk->icsk_ca_state == TCP_CA_Open) {
+ u32 cwnd = val;
+ cwnd = min(cwnd, (u32)sysctl_tcp_user_cwnd_max);
+ cwnd = min(cwnd, tp->snd_cwnd_clamp);
+
+ if (tp->snd_cwnd != cwnd) {
+ tp->snd_cwnd = cwnd;
+ tp->snd_cwnd_stamp = tcp_time_stamp;
+ tp->snd_cwnd_cnt = 0;
+ }
+ } else
+ err = -EINVAL;
+ break;
+
#ifdef CONFIG_TCP_MD5SIG
case TCP_MD5SIG:
/* Read the IP->Key mappings from userspace */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b4ed957..2d10a44 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -60,6 +60,8 @@ int sysctl_tcp_base_mss __read_mostly = 512;
/* By default, RFC2861 behavior. */
int sysctl_tcp_slow_start_after_idle __read_mostly = 1;
+int sysctl_tcp_user_cwnd_max __read_mostly;
+
int sysctl_tcp_cookie_size __read_mostly = 0; /* TCP_COOKIE_MAX */
EXPORT_SYMBOL_GPL(sysctl_tcp_cookie_size);
^ permalink raw reply related
* Re: [PATCH] tcp: Socket option to set congestion window
From: Stephen Hemminger @ 2010-05-26 5:08 UTC (permalink / raw)
To: Tom Herbert; +Cc: davem, netdev, ycheng
In-Reply-To: <alpine.DEB.1.00.1005252157150.27170@pokey.mtv.corp.google.com>
On Tue, 25 May 2010 22:01:13 -0700 (PDT)
Tom Herbert <therbert@google.com> wrote:
> This patch allows an application to set the TCP congestion window
> for a connection through a socket option. The maximum value that
> may set is specified in a sysctl value. When the sysctl is set to
> zero, the default value, the socket option is disabled.
>
> The socket option is most useful to set the initial congestion
> window for a connection to a larger value than the default in
> order to improve latency. This socket option would typically be
> used by an "intelligent" application which might have better knowledge
> than the kernel as to what an appropriate initial congestion window is.
>
> One use of this might be with an application which maintains per
> client path characteristics. This could allow setting the congestion
> window more precisely than which could be achieved through the
> route command.
>
> A second use of this might be to reduce the number of simultaneous
> connections that a client might open to the server; for instance
> when a web browser opens multiple connections to a server. With multiple
> connections the aggregate congestion window is larger than that of a
> single connecton (num_conns * cwnd), this effectively can be used to
> circumvent slowstart and improve latency. With this socket option, a
> single connection with a large initial congestion window could be used,
> which retains the latency properties of multiple connections but
> nicely reducing # of connections (load) on the network.
>
> The systctl to enable and control this feature is
>
> net.ipv4.tcp_user_cwnd_max
>
> The socket option call would be:
>
> setsockopt(fd, IPPROTO_TCP, TCP_CWND, &val, sizeof (val))
>
> where val is the congestion window in # MSS.
>
The IETF TCP maintainers already think Linux TCP allows unsafe
operation, this will just allow more possible misuse and prove
their argument. Until/unless this behavior was approved by
a wider set of research, I don't think it should be accepted at
this time.
--
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox