* Re: pull request: wireless-next 2012-07-11
From: John W. Linville @ 2012-07-11 18:33 UTC (permalink / raw)
To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20120711181721.GC1906-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 18284 bytes --]
Ugh, forgot the signature again...
On Wed, Jul 11, 2012 at 02:17:22PM -0400, John W. Linville wrote:
> commit c3e7be41a27f507047d40a31462a5ea6d7f52f10
>
> Dave,
>
> Here is another batch of updates intended for 3.6.
>
> Several drivers see updates: mwifiex, ath9k, iwlwifi, and a handful
> of others. The bcma bus got a lot of attention from Hauke Mehrtens.
> The cfg80211 component gets a flurry of patches for multi-channel
> support, and the mac80211 component gets the first few VHT (11ac)
> and 60GHz (11ad) patches.
>
> Additionally, the NFC subsystem gets a series of updates. According to
> Samuel, "Here are the interesting bits:
>
> - A better error management for the HCI stack.
> - An LLCP "late" binding implementation for a better NFC SAP usage. SAPs are
> now reserved only when there's a client for it.
> - Support for Sony RC-S360 (a.k.a. PaSoRi) pn533 based dongle. We can read and
> write NFC tags and also establish a p2p link with this dongle now.
> - A few LLCP fixes."
>
> Finally, this includes another pull of the fixes from the wireless
> tree in order to resolve some merge issues.
>
> Please let me know if there problems!
>
> Thanks,
>
> John
>
> ---
>
> The following changes since commit 061a5c316b6526dbc729049a16243ec27937cc31:
>
> Merge branch 'davem-next.r8169' of git://violet.fr.zoreil.com/romieu/linux (2012-07-09 16:09:47 -0700)
>
> are available in the git repository at:
>
>
> git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next.git for-davem
>
> for you to fetch changes up to c3e7be41a27f507047d40a31462a5ea6d7f52f10:
>
> Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem (2012-07-10 12:29:56 -0400)
>
> ----------------------------------------------------------------
>
> Amitkumar Karwar (1):
> mwifiex: add set_antenna handler support
>
> Arend van Spriel (1):
> brcmsmac: fix brcms_c_regd_init() which crashed after 11ad patch
>
> Avinash Patil (6):
> mwifiex: pass cfg80211_beacon_data to mwifiex_set_mgmt_ie()
> mwifiex: separate IE parsing for Head/Tail IEs and beacon_ies etc
> mwifiex: overwrite earlier IE buffers for new set IE request
> mwifiex: add change_beacon cfg80211 handler
> mwifiex: advertise WPS probe response offload support to cfg80211
> mwifiex: parse WPS IEs from beacon_data
>
> Bing Zhao (1):
> mwifiex: fix Coverity SCAN CID 709078: Resource leak (RESOURCE_LEAK)
>
> Eliad Peller (5):
> mac80211: flush queues before deauth/disassoc
> mac80211: don't require associated->beacon_ies for ps
> mac80211: allow calling ieee80211_ap_probereq_get() during auth/assoc
> mac80211: always set in_reconfig=false on wakeup
> mac80211: destroy assoc_data correctly if assoc fails
>
> Emmanuel Grumbach (3):
> iwlwifi: configure the queues from the op_mode
> iwlwifi: disable the watchdog for queues by default
> iwlegacy: don't mess up the SCD when removing a key
>
> Eric Lapuyade (15):
> NFC: Prepare asynchronous error management for driver and shdlc
> NFC: Removed addressed shdlc TODOs
> NFC: Handle SHDLC RSET frames from an SHDLC connected chip
> NFC: Remove an impossible HCI error case
> NFC: Implement HCP reaggregation allocation error case
> NFC: Changed HCI cmd execution completion result to std linux errno
> NFC: Driver failure API
> NFC: Factorize HCI cmd completion
> NFC: Implement HCI driver or internal error management
> NFC: Core must test the device polling state inside the device lock
> NFC: nfc_targets_found() should accept zero target found
> NFC: nfc_driver_failure() implementation
> NFC: Error management documentation
> NFC: update PN544 HCI driver state when opened/closed
> NFC: Allow HCI driver to pre-open pipes to some gates
>
> Gabor Juhos (20):
> ath9k: define DEVID for QCA955x
> ath9k: define MAC version for AR9550
> ath9k: set MAC version for AR9550
> ath9k: add platform_device_id for AR9550
> ath9k: add BB name string for AR9550
> ath9k: clear pciexpress flag for AR9550
> ath9k: enable TX/RX data byte swap for AR9550
> ath9k: add initvals for AR9550
> ath9k: add mode register initialization code for AR9550
> ath9k: read spur frequency information from eeprom for AR9550
> ath9k: fix XPABIASLEVEL settings for AR9550
> ath9k: fix antenna control configuration for AR9550
> ath9k: fix PAPRD settings for AR9550
> ath9k: fix RF channel frequency configuration for AR9550
> ath9k: disable SYNC_HOST1_FATAL interrupts for AR9550
> ath9k: skip internal regulator configuration for AR9550
> ath9k: fix PLL initialization for AR9550
> ath9k: enable PLL workaround for AR9550
> ath9k: set 4ADDRESS bit in RX filter for AR9550
> ath9k: enable support for AR9550
>
> Hauke Mehrtens (9):
> bcma: extend workaround for bcm4331
> bcma: add constants for chip ids
> bcma: remove fix for 4329b0 bad LPOM is detection
> bcma: add PCI ID for BCM43224
> bcma: complete workaround for BCMA43224 and BCM4313
> bcma: remove bcma_pmu_{pll,swreg}_init()
> bcma: remove chip ids doing nothing from PMU initialization.
> bcma: add bcma_pmu_spuravoid_pllupdate()
> bcma: add mdelay bcma_pmu_resources_init()
>
> Johannes Berg (17):
> iwlwifi: bump trace message limit
> iwlwifi: use __get_str in tracing
> iwlwifi: limit dwell time more strictly
> mac80211: make __ieee80211_recalc_idle static
> cfg80211: don't allow WoWLAN support without CONFIG_PM
> mac80211: don't expose ieee80211_add_srates_ie()
> Merge remote-tracking branch 'wireless-next/master' into mac80211-next
> iwlwifi: add trailing newline to some messages
> iwlwifi: fix debug message level
> mac80211: remove tx_frags driver callback
> mac80211_hwsim: fix NUM_BANDS usage
> mac80211: add TX prepare API
> iwlwifi: remove unneeded NULL check
> cfg80211: fix locking regression in monitor channel tracking
> mac80211: fix debugfs default key links
> mac80211: fix crash with single-queue drivers
> mac80211_hwsim: add testmode code to stop/wake queues
>
> John W. Linville (4):
> Merge branch 'master' of git://git.kernel.org/.../linville/wireless
> Merge branch 'for-john' of git://git.sipsolutions.net/mac80211-next
> Merge branch 'for-john' of git://git.kernel.org/.../iwlwifi/iwlwifi-next
> Merge branch 'master' of git://git.kernel.org/.../linville/wireless-next into for-davem
>
> Mahesh Palivela (3):
> wireless: add VHT (802.11ac) definitions
> cfg80211: allow advertising VHT capabilities
> mac80211: include VHT capability IE in probe requests
>
> Michal Kazior (13):
> cfg80211: introduce cfg80211_stop_ap
> cfg80211: .stop_ap when interface is going down
> cfg80211: add channel tracking for AP and mesh
> cfg80211: track ibss fixed channel
> cfg80211: introduce cfg80211_get_chan_state
> cfg80211: track monitor interfaces count
> mac80211: refactor virtual monitor code
> cfg80211: refuse to .set_monitor_channel when non-monitors are present
> cfg80211: track monitor channel
> cfg80211: set initial monitor channel
> cfg80211/mac80211: remove .get_channel
> cfg80211: add channel checking for iface combinations
> cfg80211: respect iface combinations when starting operation
>
> Mohammed Shafi Shajakhan (3):
> ath9k: Fix clearing of BTCOEX flags
> ath9k: Fix MCI cleanup
> ath9k: Stop the BTCOEX timers before disabling BTCOEX
>
> Oskar Schirmer (1):
> net/wireless: remove macro defined twice with same value
>
> Rafał Miłecki (2):
> b43: N-PHY: fix RSSI calibration
> bcma: use custom printing functions
>
> Rajkumar Manoharan (5):
> ath9k_hw: start noisefloor calibration after MCI reset
> ath9k_hw: do not load noise floor readings when it is running
> ath9k: fix fullsleep power consumption when BTCOEX is enabled
> ath9k: fix power consumption on network sleep when BTCOEX is enabled
> ath9k_hw: fix AR9462 2g5g switch on full reset
>
> Richard A. Griffiths (1):
> iwlwifi: disallow log_event access if interface down
>
> Samuel Ortiz (18):
> NFC: Add modules alias for NFC sockets
> NFC: Add netlink module alias for NFC
> NFC: Update LLCP socket target index when getting a connection
> NFC: Fix LLCP getname socket op
> NFC: Build LLCP general bytes upon request
> NFC: Close listening LLCP sockets when the device is gone
> NFC: Release LLCP SAP when the owner is released
> NFC: Forbid LLCP service name reusing
> NFC: Forbid SSAP binding to a not well known LLCP service
> NFC: LLCP late binding
> NFC: Handle LLCP Disconnected Mode frames
> NFC: Remove warning from nfc_llcp_local_put
> NFC: Do not return EBUSY when stopping a poll that's already stopped
> NFC: Dereference LLCP bind socket address after checking for it to be NULL
> NFC: Add initial Sony RC-S360 support to pn533
> NFC: Use communicate thru only for PaSoRi when trying to read Felica tags
> NFC: Add ISO 14443 type B protocol
> NFC: Check for llcp_sock and its device from llcp_sock_getname
>
> Sasha Levin (1):
> NFC: Prevent NULL deref when getting socket name
>
> Stanislaw Gruszka (2):
> rt2x00usb: fix indexes ordering on RX queue kick
> iwlegacy: always monitor for stuck queues
>
> Thomas Huehn (3):
> mac80211: reduce IEEE80211_TX_MAX_RATES
> mac80211: correct size the argument to kzalloc in minstrel_ht
> ath9k: fixing register bit shift values of control packets to support TPC
>
> Vladimir Kondratiev (5):
> cfg80211: add 802.11ad (60gHz band) support
> wireless: regulatory for 60g
> wireless: 60g protocol constants
> {nl,cfg}80211: support high bitrates
> cfg80211: bitrate calculation for 60g
>
> Documentation/nfc/nfc-hci.txt | 33 +
> drivers/bcma/bcma_private.h | 9 +
> drivers/bcma/core.c | 10 +-
> drivers/bcma/driver_chipcommon.c | 5 +-
> drivers/bcma/driver_chipcommon_pmu.c | 331 +++--
> drivers/bcma/driver_mips.c | 24 +-
> drivers/bcma/driver_pci_host.c | 18 +-
> drivers/bcma/host_pci.c | 5 +-
> drivers/bcma/main.c | 19 +-
> drivers/bcma/scan.c | 24 +-
> drivers/bcma/sprom.c | 26 +-
> drivers/net/wireless/ath/ath6kl/cfg80211.c | 2 +
> drivers/net/wireless/ath/ath9k/ahb.c | 4 +
> drivers/net/wireless/ath/ath9k/ar9003_calib.c | 13 +-
> drivers/net/wireless/ath/ath9k/ar9003_eeprom.c | 7 +-
> drivers/net/wireless/ath/ath9k/ar9003_hw.c | 87 +-
> drivers/net/wireless/ath/ath9k/ar9003_mci.c | 18 +-
> drivers/net/wireless/ath/ath9k/ar9003_paprd.c | 2 +-
> drivers/net/wireless/ath/ath9k/ar9003_phy.c | 72 +-
> drivers/net/wireless/ath/ath9k/ar9003_phy.h | 6 +-
> .../net/wireless/ath/ath9k/ar955x_1p0_initvals.h | 1284 ++++++++++++++++++++
> drivers/net/wireless/ath/ath9k/ath9k.h | 4 +
> drivers/net/wireless/ath/ath9k/gpio.c | 25 +-
> drivers/net/wireless/ath/ath9k/hw.c | 46 +-
> drivers/net/wireless/ath/ath9k/hw.h | 2 +
> drivers/net/wireless/ath/ath9k/mac.c | 2 +-
> drivers/net/wireless/ath/ath9k/mac.h | 1 +
> drivers/net/wireless/ath/ath9k/main.c | 16 +-
> drivers/net/wireless/ath/ath9k/mci.c | 2 +-
> drivers/net/wireless/ath/ath9k/recv.c | 3 +
> drivers/net/wireless/ath/ath9k/reg.h | 10 +-
> drivers/net/wireless/ath/carl9170/tx.c | 6 +-
> drivers/net/wireless/b43/phy_n.c | 17 +-
> drivers/net/wireless/brcm80211/brcmsmac/channel.c | 17 +-
> drivers/net/wireless/iwlegacy/3945-rs.c | 2 +-
> drivers/net/wireless/iwlegacy/4965-mac.c | 4 +-
> drivers/net/wireless/iwlegacy/common.c | 14 +-
> drivers/net/wireless/iwlwifi/dvm/commands.h | 40 +-
> drivers/net/wireless/iwlwifi/dvm/debugfs.c | 6 +-
> drivers/net/wireless/iwlwifi/dvm/dev.h | 16 -
> drivers/net/wireless/iwlwifi/dvm/main.c | 56 +-
> drivers/net/wireless/iwlwifi/dvm/scan.c | 37 +-
> drivers/net/wireless/iwlwifi/dvm/ucode.c | 37 +
> drivers/net/wireless/iwlwifi/iwl-debug.c | 5 +-
> drivers/net/wireless/iwlwifi/iwl-devtrace.h | 4 +-
> drivers/net/wireless/iwlwifi/iwl-drv.c | 1 +
> drivers/net/wireless/iwlwifi/iwl-trans.h | 27 +-
> drivers/net/wireless/iwlwifi/pcie/internal.h | 3 +-
> drivers/net/wireless/iwlwifi/pcie/rx.c | 3 -
> drivers/net/wireless/iwlwifi/pcie/trans.c | 36 +-
> drivers/net/wireless/libertas/host.h | 1 -
> drivers/net/wireless/mac80211_hwsim.c | 12 +-
> drivers/net/wireless/mwifiex/cfg80211.c | 77 +-
> drivers/net/wireless/mwifiex/fw.h | 21 +
> drivers/net/wireless/mwifiex/ie.c | 185 +--
> drivers/net/wireless/mwifiex/ioctl.h | 5 +
> drivers/net/wireless/mwifiex/main.h | 2 +-
> drivers/net/wireless/mwifiex/sta_cmd.c | 37 +
> drivers/net/wireless/mwifiex/sta_cmdresp.c | 30 +
> drivers/net/wireless/p54/txrx.c | 6 +-
> drivers/net/wireless/rt2x00/rt2x00usb.c | 2 +-
> drivers/net/wireless/ti/wlcore/main.c | 2 +
> drivers/nfc/nfcwilink.c | 7 +-
> drivers/nfc/pn533.c | 224 +++-
> drivers/nfc/pn544_hci.c | 37 +-
> include/linux/bcma/bcma.h | 30 +
> include/linux/bcma/bcma_driver_chipcommon.h | 23 +
> include/linux/ieee80211.h | 160 ++-
> include/linux/nfc.h | 14 +-
> include/linux/nl80211.h | 17 +
> include/net/cfg80211.h | 40 +-
> include/net/mac80211.h | 44 +-
> include/net/nfc/hci.h | 19 +-
> include/net/nfc/nfc.h | 2 +
> net/mac80211/cfg.c | 24 +-
> net/mac80211/debugfs_key.c | 16 +-
> net/mac80211/driver-ops.h | 22 +-
> net/mac80211/ieee80211_i.h | 11 +-
> net/mac80211/iface.c | 258 ++--
> net/mac80211/main.c | 17 +-
> net/mac80211/mesh_plink.c | 4 +-
> net/mac80211/mlme.c | 34 +-
> net/mac80211/rc80211_minstrel_ht.c | 2 +-
> net/mac80211/trace.h | 7 +
> net/mac80211/tx.c | 16 +-
> net/mac80211/util.c | 49 +-
> net/nfc/core.c | 38 +-
> net/nfc/hci/command.c | 26 +-
> net/nfc/hci/core.c | 104 +-
> net/nfc/hci/hci.h | 12 +-
> net/nfc/hci/shdlc.c | 38 +-
> net/nfc/llcp/llcp.c | 342 ++++--
> net/nfc/llcp/llcp.h | 5 +
> net/nfc/llcp/sock.c | 33 +-
> net/nfc/nci/core.c | 5 +-
> net/nfc/nci/ntf.c | 5 +-
> net/nfc/netlink.c | 9 +
> net/wireless/Makefile | 2 +-
> net/wireless/ap.c | 46 +
> net/wireless/chan.c | 62 +-
> net/wireless/core.c | 84 +-
> net/wireless/core.h | 64 +-
> net/wireless/ibss.c | 11 +
> net/wireless/mesh.c | 30 +-
> net/wireless/mlme.c | 17 +
> net/wireless/nl80211.c | 65 +-
> net/wireless/reg.c | 5 +-
> net/wireless/util.c | 156 ++-
> net/wireless/wext-compat.c | 9 +-
> 109 files changed, 4013 insertions(+), 1053 deletions(-)
> create mode 100644 drivers/net/wireless/ath/ath9k/ar955x_1p0_initvals.h
> create mode 100644 net/wireless/ap.c
> --
> John W. Linville Someday the world will need a hero, and you
> linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org might be all we have. Be ready.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
John W. Linville Someday the world will need a hero, and you
linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org might be all we have. Be ready.
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply
* Add missing license tag to nci.
From: Dave Jones @ 2012-07-11 18:31 UTC (permalink / raw)
To: netdev; +Cc: ilane
nci: module license 'unspecified' taints kernel.
Signed-off-by: Dave Jones <davej@redhat.com>
diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c
index d560e6f..f18f207 100644
--- a/net/nfc/nci/core.c
+++ b/net/nfc/nci/core.c
@@ -27,6 +27,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": %s: " fmt, __func__
+#include <linux/module.h>
#include <linux/types.h>
#include <linux/workqueue.h>
#include <linux/completion.h>
@@ -878,3 +879,5 @@ static void nci_cmd_work(struct work_struct *work)
jiffies + msecs_to_jiffies(NCI_CMD_TIMEOUT));
}
}
+
+MODULE_LICENSE("GPL");
^ permalink raw reply related
* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Rick Jones @ 2012-07-11 18:23 UTC (permalink / raw)
To: Eric Dumazet; +Cc: nanditad, netdev, mattmathis, codel, ncardwell, David Miller
In-Reply-To: <1342019518.3265.8116.camel@edumazet-glaptop>
On 07/11/2012 08:11 AM, Eric Dumazet wrote:
>
>
> Tests using a single TCP flow.
>
> Tests on 10Gbit links :
>
>
> echo 16384 >/proc/sys/net/ipv4/tcp_limit_output_bytes
> OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
> tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
> tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 16 tpci_snd_cwnd 79
> tcpi_reordering 53 tcpi_total_retrans 0
I take it you hacked your local copy of netperf to emit those? Or did I
leave some cruft behind in something I committed to the repository?
What was the ultimate limiter on throughput? I notice it didn't achieve
link-rate on either 10 GbE nor 1 GbE.
> Thats the plan : limiting numer of bytes in Qdisc, not number of bytes
> in socket write queue.
So the SO_SNDBUF can still grow rather larger than necessary? It is
just that TCP will be nice to the other flows by not dumping all of it
into the qdisc at once. Latency seen by the application itself is then
unchanged since there will still be (potentially) as much stuff queued
in the SO_SNDBUF as before right?
rick
^ permalink raw reply
* Re: [RFC PATCH 09/10] ixgbe: Add support for displaying the number of Tx/Rx channels
From: Ben Hutchings @ 2012-07-11 18:21 UTC (permalink / raw)
To: Alexander Duyck
Cc: netdev, davem, jeffrey.t.kirsher, edumazet, therbert,
alexander.duyck
In-Reply-To: <20120630001659.29939.61276.stgit@gitlad.jf.intel.com>
On Fri, 2012-06-29 at 17:16 -0700, Alexander Duyck wrote:
> This patch adds support for the ethtool get_channels operation.
>
> Since the ixgbe driver has to support DCB as well as the other modes the
> assumption I made here is that the number of channels in DCB modes refers
> to the number of queues per traffic class, not the number of queues total.
[...]
When MSI-X is enabled, a 'channel' is an MSI-X vector and the associated
queues, i.e. total number of channels reported should be the total
number of MSI-X vectors in use. (That was my intended interpretation,
anyway. It may be that there is too much variation in the way queues
and interrupts are associated for these operations to be defined in a
general way.)
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Response
From: drdave @ 2012-07-11 18:15 UTC (permalink / raw)
Gooday to you Sir/Ma,
We here by pronounce you the lucky beneficiary of the (1Million GBP*) from Microsoft draw, which was held in the UK,you are to provide your full details to this desk.
Required info:Full name,Full address,Country of residence,Originality,Tell number,Age,Occupation,Sex. Here is the reply to email (m.net.org@msn.com) kindly reply us via this email (m.net.org@msn.com)
Kind Regards
Lady Lisa.
m.net.org@msn.com
^ permalink raw reply
* Re: [RFC PATCH 07/10] ixgbe: Add function for setting XPS queue mapping
From: Ben Hutchings @ 2012-07-11 18:15 UTC (permalink / raw)
To: Alexander Duyck
Cc: netdev, davem, jeffrey.t.kirsher, edumazet, therbert,
alexander.duyck
In-Reply-To: <20120630001649.29939.725.stgit@gitlad.jf.intel.com>
On Fri, 2012-06-29 at 17:16 -0700, Alexander Duyck wrote:
> This change adds support for ixgbe to configure the XPS queue mapping on
> load. The result of this change is that on open we will now be resetting
> the number of Tx queues, and then setting the default configuration for XPS
> based on if ATR is enabled or disabled.
[...]
I didn't see where you're resetting the number of TX queues; was that
actually added in an earlier patch?
It seems strange to be resetting XPS configuration on open; normally net
device configuration persists as long as the device is registered.
Maybe only do this if the number of TX queues has to change?
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: UDP ordering when using multiple rx queue
From: Rick Jones @ 2012-07-11 17:50 UTC (permalink / raw)
To: Jean-Michel Hautbois; +Cc: Merav Sicron, netdev
In-Reply-To: <CAL8zT=hi9_Y4oGw=cVSnYE=km6MZBAAie-A5RWLy=47FR8aTag@mail.gmail.com>
On 07/11/2012 06:41 AM, Jean-Michel Hautbois wrote:
> I confirm that using ethtool -L eth1 combined 1 solves my issue.
My being pedantic or not, you have kludged around your issue, which is a
broken application.
Can you actually ass-u-me that this application is deployed with just a
single back-to-back link between two systems? I'm guessing that isn't
the way it is deployed in production or there would be zero call for
multicast. There is *zero* guarantee of ordering with UDP, multicast
or otherwise - certainly not between sends involving different port
numbers, nor for that matter even between sends involving the same port
numbers. Once you leave the NIC (and perhaps even before) all bets are off.
Have you tested using bonded links? Or through switches which
themselves are joined by bonded links? Various bonding modes can even
re-order traffic of a single flow (eg mode-rr). As I understand it, the
moves to "break the bottlenecks" imposed by spanning tree will mean that
meshes of switches, even without bonded links, will send traffic of
different flows through different paths through the switch fabric. In
those cases they might send traffic to the same multicast address along
the same path each time, but you probably cannot count on that, nor them
sending traffic to different multicast addresses along the same path.
Some clever meshed-switch folks may go ahead and look up at the
transport-layer port numbers when deciding on their splits - just like
some bonding modes can.
Until you get the application re-written to handle out-of-order traffic,
it "works" only by chance.
> Unicast traffic seems ok (I used netperf in order to check this assumption).
Netperf does nothing to check the order of datagrams. It is perfectly
content receiving datagrams in any order. So you can use it to see that
a single flow of UDP unicast is not split-up by the NIC (by looking at
the per-queue stats) you can assume nothing about the final ordering of
those UDP datagrams from a "successful" netperf UDP_STREAM test.
rick jones
^ permalink raw reply
* [PATCH net-next v3 3/3] 6lowpan: Change byte order when storing/accessing to len field
From: Tony Cheneau @ 2012-07-11 16:51 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Alexander Smirnov
In-Reply-To: <1342025476-20949-1-git-send-email-tony.cheneau@amnesiak.org>
Lenght field should be encoded using big endian byte order, such as intend in the specs.
As it is currently written, the len field would not be decoded properly on an implementation using the correct byte ordering. Hence, it could lead to interroperability issues.
Also, I rewrote the code so that iphc0 argument of lowpan_alloc_new_frame could be removed.
Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
---
net/ieee802154/6lowpan.c | 20 ++++++++++++--------
1 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 9de1ece..75d91bb 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -649,7 +649,7 @@ static void lowpan_fragment_timer_expired(unsigned long entry_addr)
}
static struct lowpan_fragment *
-lowpan_alloc_new_frame(struct sk_buff *skb, u8 iphc0, u8 len, u8 tag)
+lowpan_alloc_new_frame(struct sk_buff *skb, u8 len, u8 tag)
{
struct lowpan_fragment *frame;
@@ -660,7 +660,7 @@ lowpan_alloc_new_frame(struct sk_buff *skb, u8 iphc0, u8 len, u8 tag)
INIT_LIST_HEAD(&frame->list);
- frame->length = (iphc0 & 7) | (len << 3);
+ frame->length = len;
frame->tag = tag;
/* allocate buffer for frame assembling */
@@ -718,14 +718,18 @@ lowpan_process_data(struct sk_buff *skb)
case LOWPAN_DISPATCH_FRAGN:
{
struct lowpan_fragment *frame;
- u8 len, offset;
- u16 tag;
+ /* slen stores the rightmost 8 bits of the 11 bits length */
+ u8 slen, offset;
+ u16 len, tag;
bool found = false;
- if (lowpan_fetch_skb_u8(skb, &len) || /* frame length */
+ if (lowpan_fetch_skb_u8(skb, &slen) || /* frame length */
lowpan_fetch_skb_u16(skb, &tag)) /* fragment tag */
goto drop;
+ /* adds the 3 MSB to the 8 LSB to retrieve the 11 bits length */
+ len = ((iphc0 & 7) << 8) | slen;
+
/*
* check if frame assembling with the same tag is
* already in progress
@@ -740,7 +744,7 @@ lowpan_process_data(struct sk_buff *skb)
/* alloc new frame structure */
if (!found) {
- frame = lowpan_alloc_new_frame(skb, iphc0, len, tag);
+ frame = lowpan_alloc_new_frame(skb, len, tag);
if (!frame)
goto unlock_and_drop;
}
@@ -1008,8 +1012,8 @@ lowpan_skb_fragmentation(struct sk_buff *skb)
tag = fragment_tag++;
/* first fragment header */
- head[0] = LOWPAN_DISPATCH_FRAG1 | (payload_length & 0x7);
- head[1] = (payload_length >> 3) & 0xff;
+ head[0] = LOWPAN_DISPATCH_FRAG1 | ((payload_length >> 8) & 0x7);
+ head[1] = payload_length & 0xff;
head[2] = tag >> 8;
head[3] = tag & 0xff;
--
1.7.3.4
^ permalink raw reply related
* [PATCH net-next v3 2/3] 6lowpan: Change byte order when storing/accessing u16 tag
From: Tony Cheneau @ 2012-07-11 16:51 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Alexander Smirnov
In-Reply-To: <1342025476-20949-1-git-send-email-tony.cheneau@amnesiak.org>
The tag field should be stored and accessed using big endian byte order (as
intended in the specs). Or else, when displayed with a trafic analyser, such a
Wireshark, the field not properly displayed (e.g. 0x01 00 instead of 0x00 01,
and so on).
Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
---
net/ieee802154/6lowpan.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 0c9f6d1..9de1ece 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -303,7 +303,7 @@ static inline int lowpan_fetch_skb_u16(struct sk_buff *skb, u16 *val)
if (unlikely(!pskb_may_pull(skb, 2)))
return -EINVAL;
- *val = skb->data[0] | (skb->data[1] << 8);
+ *val = (skb->data[0] << 8) | skb->data[1];
skb_pull(skb, 2);
return 0;
@@ -1010,8 +1010,8 @@ lowpan_skb_fragmentation(struct sk_buff *skb)
/* first fragment header */
head[0] = LOWPAN_DISPATCH_FRAG1 | (payload_length & 0x7);
head[1] = (payload_length >> 3) & 0xff;
- head[2] = tag & 0xff;
- head[3] = tag >> 8;
+ head[2] = tag >> 8;
+ head[3] = tag & 0xff;
err = lowpan_fragment_xmit(skb, head, header_length, 0, 0);
--
1.7.3.4
^ permalink raw reply related
* [PATCH net-next v3 0/3] 6lowpan: Various bug fixes
From: Tony Cheneau @ 2012-07-11 16:51 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Alexander Smirnov
Hello,
After reading and playing with the 6lowpan code, I found out a few issues. This
patchset fixes them. This patchset should apply cleanly against the current
net-next. It contains only bug fixes, I'll send later on an other patchset that
will contain new functionalities.
Changes since version 2:
- remove a patch that prevented fragmentation to work after few packets have
been send: Alexander included the patch in his patchset
- fix the title of the git commit to include the "6lowpan" tag
Regards,
Tony Cheneau
Tony Cheneau (3):
Fix null pointer dereference in UDP uncompression function
Change byte order when storing/accessing u16 tag
Change byte order when storing/accessing to len field
net/ieee802154/6lowpan.c | 29 ++++++++++++++++++-----------
1 files changed, 18 insertions(+), 11 deletions(-)
--
1.7.3.4
^ permalink raw reply
* [PATCH net-next v3 1/3] 6lowpan: Fix null pointer dereference in UDP uncompression function
From: Tony Cheneau @ 2012-07-11 16:51 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Alexander Smirnov
In-Reply-To: <1342025476-20949-1-git-send-email-tony.cheneau@amnesiak.org>
When a UDP packet gets fragmented, a crash will occur at reassembly time.
This is because skb->transport_header is not set during earlier period of fragment reassembly.
As a consequence, call to udp_hdr() return NULL and uh (which is NULL) gets
dereferenced without much test.
Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
---
net/ieee802154/6lowpan.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index f4070e5..0c9f6d1 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -315,6 +315,9 @@ lowpan_uncompress_udp_header(struct sk_buff *skb)
struct udphdr *uh = udp_hdr(skb);
u8 tmp;
+ if (!uh)
+ goto err;
+
if (lowpan_fetch_skb_u8(skb, &tmp))
goto err;
--
1.7.3.4
^ permalink raw reply related
* RE: [RFC 1/2] i2400m: remove SDIO device support
From: Perez-Gonzalez, Inaky @ 2012-07-11 16:50 UTC (permalink / raw)
To: John W. Linville, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Ortiz, Samuel, linux-wimax,
wimax-BPSAo7wm5JOHVYUYWc+uSQ@public.gmane.org
In-Reply-To: <1341952049-32193-1-git-send-email-linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
> From: John W. Linville [mailto:linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org]
>
> From: "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
>
> SDIO support in this driver was intended to support the iwmc3200
> device. This hardware never became available to normal humans.
> Leaving this driver imposes unwelcome maintenance costs for no clear
> benefit.
>
> Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: John W. Linville <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
> ---
> If there are no objections, I'll push this series through the
> wireless-next tree along with the iwmc3200wifi removal.
Acked-by: Inaky Perez-Gonzalez <inaky.perez-gonzalez-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Ben Greear @ 2012-07-11 16:03 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, ycheng, dave.taht, netdev, codel, therbert,
mattmathis, nanditad, ncardwell, andrewmcgr, Rick Jones
In-Reply-To: <1342022043.3265.8179.camel@edumazet-glaptop>
On 07/11/2012 08:54 AM, Eric Dumazet wrote:
> On Wed, 2012-07-11 at 08:43 -0700, Ben Greear wrote:
>> On 07/11/2012 08:25 AM, Eric Dumazet wrote:
>>> On Wed, 2012-07-11 at 08:16 -0700, Ben Greear wrote:
>>>
>>>> I haven't read your patch in detail, but I was wondering if this feature
>>>> would cause trouble for applications that are servicing many sockets at once
>>>> and so might take several ms between handling each individual socket.
>>>>
>>>
>>> Well, this patch has no impact for such applications. In fact their
>>> send()/write() will return to userland faster than before (for very
>>> large send())
>>
>> Maybe I'm just confused. Is your patch just mucking with
>> the queues below the tcp xmit queues? From the patch description
>> I was thinking you were somehow directly limiting the TCP xmit
>> queues...
>>
>
> I dont limit tcp xmit queues. I might avoid excessive autotuning.
>
>
>
>> If you are just draining the tcp xmit queues on a new/faster
>> trigger, then I see no problem with that, and no need for
>> a per-socket control.
>
> Thats the plan : limiting numer of bytes in Qdisc, not number of bytes
> in socket write queue.
Thanks for the explanation.
Out of curiosity, have you tried running multiple TCP streams
with different processes driving each stream, where each is trying
to drive, say, 700Mbps bi-directional traffic over a 1Gbps link?
Perhaps with 50ms of latency generated by a network emulator.
This used to cause some extremely high latency
due to excessive TCP xmit queues (from what I could tell),
but maybe this new patch will cure that.
I'll re-run my tests with your patch eventually..but too bogged
down to do so soon.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-11 15:54 UTC (permalink / raw)
To: Ben Greear; +Cc: nanditad, netdev, mattmathis, codel, ncardwell, David Miller
In-Reply-To: <4FFD9F18.6030401@candelatech.com>
On Wed, 2012-07-11 at 08:43 -0700, Ben Greear wrote:
> On 07/11/2012 08:25 AM, Eric Dumazet wrote:
> > On Wed, 2012-07-11 at 08:16 -0700, Ben Greear wrote:
> >
> >> I haven't read your patch in detail, but I was wondering if this feature
> >> would cause trouble for applications that are servicing many sockets at once
> >> and so might take several ms between handling each individual socket.
> >>
> >
> > Well, this patch has no impact for such applications. In fact their
> > send()/write() will return to userland faster than before (for very
> > large send())
>
> Maybe I'm just confused. Is your patch just mucking with
> the queues below the tcp xmit queues? From the patch description
> I was thinking you were somehow directly limiting the TCP xmit
> queues...
>
I dont limit tcp xmit queues. I might avoid excessive autotuning.
> If you are just draining the tcp xmit queues on a new/faster
> trigger, then I see no problem with that, and no need for
> a per-socket control.
Thats the plan : limiting numer of bytes in Qdisc, not number of bytes
in socket write queue.
^ permalink raw reply
* [PATCH v3 net-next] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-11 15:50 UTC (permalink / raw)
To: David Miller; +Cc: nanditad, netdev, codel, ncardwell, mattmathis
This introduce TSQ (TCP Small Queues)
TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.
sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.
TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.
As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.
This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.
Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.
Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)
I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.
As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.
If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.
[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
but some drivers call it in their start_xmit() handler.
These drivers should at least use BQL, or else a single TCP
session can still fill the whole NIC TX ring, since TSQ will
have no effect.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
---
Documentation/networking/ip-sysctl.txt | 14 ++
include/linux/tcp.h | 9 +
include/net/sock.h | 2
include/net/tcp.h | 4
net/core/sock.c | 4
net/ipv4/sysctl_net_ipv4.c | 7 +
net/ipv4/tcp.c | 6
net/ipv4/tcp_ipv4.c | 1
net/ipv4/tcp_minisocks.c | 1
net/ipv4/tcp_output.c | 154 ++++++++++++++++++++++-
net/ipv6/tcp_ipv6.c | 1
11 files changed, 202 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 47b6c79..e20c17a 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -551,6 +551,20 @@ tcp_thin_dupack - BOOLEAN
Documentation/networking/tcp-thin.txt
Default: 0
+tcp_limit_output_bytes - INTEGER
+ Controls TCP Small Queue limit per tcp socket.
+ TCP bulk sender tends to increase packets in flight until it
+ gets losses notifications. With SNDBUF autotuning, this can
+ result in a large amount of packets queued in qdisc/device
+ on the local machine, hurting latency of other flows, for
+ typical pfifo_fast qdiscs.
+ tcp_limit_output_bytes limits the number of bytes on qdisc
+ or device to reduce artificial RTT/cwnd and reduce bufferbloat.
+ Note: For GSO/TSO enabled flows, we try to have at least two
+ packets in flight. Reducing tcp_limit_output_bytes might also
+ reduce the size of individual GSO packet (64KB being the max)
+ Default: 131072
+
UDP variables:
udp_mem - vector of 3 INTEGERs: min, pressure, max
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 2de9cf4..1888169 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -339,6 +339,9 @@ struct tcp_sock {
u32 rcv_tstamp; /* timestamp of last received ACK (for keepalives) */
u32 lsndtime; /* timestamp of last sent data packet (for restart window) */
+ struct list_head tsq_node; /* anchor in tsq_tasklet.head list */
+ unsigned long tsq_flags;
+
/* Data for direct copy to user */
struct {
struct sk_buff_head prequeue;
@@ -494,6 +497,12 @@ struct tcp_sock {
struct tcp_cookie_values *cookie_values;
};
+enum tsq_flags {
+ TSQ_THROTTLED,
+ TSQ_QUEUED,
+ TSQ_OWNED, /* tcp_tasklet_func() found socket was locked */
+};
+
static inline struct tcp_sock *tcp_sk(const struct sock *sk)
{
return (struct tcp_sock *)sk;
diff --git a/include/net/sock.h b/include/net/sock.h
index 640432a..eefce84 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -858,6 +858,8 @@ struct proto {
int (*backlog_rcv) (struct sock *sk,
struct sk_buff *skb);
+ void (*release_cb)(struct sock *sk);
+
/* Keeping track of sk's, looking them up, and port selection methods. */
void (*hash)(struct sock *sk);
void (*unhash)(struct sock *sk);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 3618fef..439984b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -253,6 +253,7 @@ extern int sysctl_tcp_cookie_size;
extern int sysctl_tcp_thin_linear_timeouts;
extern int sysctl_tcp_thin_dupack;
extern int sysctl_tcp_early_retrans;
+extern int sysctl_tcp_limit_output_bytes;
extern atomic_long_t tcp_memory_allocated;
extern struct percpu_counter tcp_sockets_allocated;
@@ -321,6 +322,8 @@ extern struct proto tcp_prot;
extern void tcp_init_mem(struct net *net);
+extern void tcp_tasklet_init(void);
+
extern void tcp_v4_err(struct sk_buff *skb, u32);
extern void tcp_shutdown (struct sock *sk, int how);
@@ -334,6 +337,7 @@ extern int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t size);
extern int tcp_sendpage(struct sock *sk, struct page *page, int offset,
size_t size, int flags);
+extern void tcp_release_cb(struct sock *sk);
extern int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg);
extern int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
const struct tcphdr *th, unsigned int len);
diff --git a/net/core/sock.c b/net/core/sock.c
index 929bdcc..24039ac 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2159,6 +2159,10 @@ void release_sock(struct sock *sk)
spin_lock_bh(&sk->sk_lock.slock);
if (sk->sk_backlog.tail)
__release_sock(sk);
+
+ if (sk->sk_prot->release_cb)
+ sk->sk_prot->release_cb(sk);
+
sk->sk_lock.owned = 0;
if (waitqueue_active(&sk->sk_lock.wq))
wake_up(&sk->sk_lock.wq);
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 12aa0c5..70730f7 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -598,6 +598,13 @@ static struct ctl_table ipv4_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec
},
+ {
+ .procname = "tcp_limit_output_bytes",
+ .data = &sysctl_tcp_limit_output_bytes,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
#ifdef CONFIG_NET_DMA
{
.procname = "tcp_dma_copybreak",
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index d902da9..4252cd8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -376,6 +376,7 @@ void tcp_init_sock(struct sock *sk)
skb_queue_head_init(&tp->out_of_order_queue);
tcp_init_xmit_timers(sk);
tcp_prequeue_init(tp);
+ INIT_LIST_HEAD(&tp->tsq_node);
icsk->icsk_rto = TCP_TIMEOUT_INIT;
tp->mdev = TCP_TIMEOUT_INIT;
@@ -796,6 +797,10 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
inet_csk(sk)->icsk_ext_hdr_len -
tp->tcp_header_len);
+ /* TSQ : try to have two TSO segments in flight */
+ xmit_size_goal = min_t(u32, xmit_size_goal,
+ sysctl_tcp_limit_output_bytes >> 1);
+
xmit_size_goal = tcp_bound_to_half_wnd(tp, xmit_size_goal);
/* We try hard to avoid divides here */
@@ -3574,4 +3579,5 @@ void __init tcp_init(void)
tcp_secret_primary = &tcp_secret_one;
tcp_secret_retiring = &tcp_secret_two;
tcp_secret_secondary = &tcp_secret_two;
+ tcp_tasklet_init();
}
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ddefd39..01545a3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2588,6 +2588,7 @@ struct proto tcp_prot = {
.sendmsg = tcp_sendmsg,
.sendpage = tcp_sendpage,
.backlog_rcv = tcp_v4_do_rcv,
+ .release_cb = tcp_release_cb,
.hash = inet_hash,
.unhash = inet_unhash,
.get_port = inet_csk_get_port,
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 6560886..c66f2ed 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -424,6 +424,7 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
treq->snt_isn + 1 + tcp_s_data_size(oldtp);
tcp_prequeue_init(newtp);
+ INIT_LIST_HEAD(&newtp->tsq_node);
tcp_init_wl(newtp, treq->rcv_isn);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c465d3e..03854ab 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -50,6 +50,9 @@ int sysctl_tcp_retrans_collapse __read_mostly = 1;
*/
int sysctl_tcp_workaround_signed_windows __read_mostly = 0;
+/* Default TSQ limit of two TSO segments */
+int sysctl_tcp_limit_output_bytes __read_mostly = 131072;
+
/* This limits the percentage of the congestion window which we
* will allow a single TSO frame to consume. Building TSO frames
* which are too large can cause TCP streams to be bursty.
@@ -65,6 +68,8 @@ int sysctl_tcp_slow_start_after_idle __read_mostly = 1;
int sysctl_tcp_cookie_size __read_mostly = 0; /* TCP_COOKIE_MAX */
EXPORT_SYMBOL_GPL(sysctl_tcp_cookie_size);
+static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
+ int push_one, gfp_t gfp);
/* Account for new data that has been sent to the network. */
static void tcp_event_new_data_sent(struct sock *sk, const struct sk_buff *skb)
@@ -783,6 +788,140 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
return size;
}
+
+/* TCP SMALL QUEUES (TSQ)
+ *
+ * TSQ goal is to keep small amount of skbs per tcp flow in tx queues (qdisc+dev)
+ * to reduce RTT and bufferbloat.
+ * We do this using a special skb destructor (tcp_wfree).
+ *
+ * Its important tcp_wfree() can be replaced by sock_wfree() in the event skb
+ * needs to be reallocated in a driver.
+ * The invariant being skb->truesize substracted from sk->sk_wmem_alloc
+ *
+ * Since transmit from skb destructor is forbidden, we use a tasklet
+ * to process all sockets that eventually need to send more skbs.
+ * We use one tasklet per cpu, with its own queue of sockets.
+ */
+struct tsq_tasklet {
+ struct tasklet_struct tasklet;
+ struct list_head head; /* queue of tcp sockets */
+};
+static DEFINE_PER_CPU(struct tsq_tasklet, tsq_tasklet);
+
+/*
+ * One tasklest per cpu tries to send more skbs.
+ * We run in tasklet context but need to disable irqs when
+ * transfering tsq->head because tcp_wfree() might
+ * interrupt us (non NAPI drivers)
+ */
+static void tcp_tasklet_func(unsigned long data)
+{
+ struct tsq_tasklet *tsq = (struct tsq_tasklet *)data;
+ LIST_HEAD(list);
+ unsigned long flags;
+ struct list_head *q, *n;
+ struct tcp_sock *tp;
+ struct sock *sk;
+
+ local_irq_save(flags);
+ list_splice_init(&tsq->head, &list);
+ local_irq_restore(flags);
+
+ list_for_each_safe(q, n, &list) {
+ tp = list_entry(q, struct tcp_sock, tsq_node);
+ list_del(&tp->tsq_node);
+
+ sk = (struct sock *)tp;
+ bh_lock_sock(sk);
+
+ if (!sock_owned_by_user(sk)) {
+ if ((1 << sk->sk_state) &
+ (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 |
+ TCPF_CLOSING | TCPF_CLOSE_WAIT))
+ tcp_write_xmit(sk,
+ tcp_current_mss(sk),
+ 0, 0,
+ GFP_ATOMIC);
+ } else {
+ /* defer the work to tcp_release_cb() */
+ set_bit(TSQ_OWNED, &tp->tsq_flags);
+ }
+ bh_unlock_sock(sk);
+
+ clear_bit(TSQ_QUEUED, &tp->tsq_flags);
+ sk_free(sk);
+ }
+}
+
+/**
+ * tcp_release_cb - tcp release_sock() callback
+ * @sk: socket
+ *
+ * called from release_sock() to perform protocol dependent
+ * actions before socket release.
+ */
+void tcp_release_cb(struct sock *sk)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (test_and_clear_bit(TSQ_OWNED, &tp->tsq_flags)) {
+ if ((1 << sk->sk_state) &
+ (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 |
+ TCPF_CLOSING | TCPF_CLOSE_WAIT))
+ tcp_write_xmit(sk,
+ tcp_current_mss(sk),
+ 0, 0,
+ GFP_ATOMIC);
+ }
+}
+EXPORT_SYMBOL(tcp_release_cb);
+
+void __init tcp_tasklet_init(void)
+{
+ int i;
+
+ for_each_possible_cpu(i) {
+ struct tsq_tasklet *tsq = &per_cpu(tsq_tasklet, i);
+
+ INIT_LIST_HEAD(&tsq->head);
+ tasklet_init(&tsq->tasklet,
+ tcp_tasklet_func,
+ (unsigned long)tsq);
+ }
+}
+
+/*
+ * Write buffer destructor automatically called from kfree_skb.
+ * We cant xmit new skbs from this context, as we might already
+ * hold qdisc lock.
+ */
+void tcp_wfree(struct sk_buff *skb)
+{
+ struct sock *sk = skb->sk;
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (test_and_clear_bit(TSQ_THROTTLED, &tp->tsq_flags) &&
+ !test_and_set_bit(TSQ_QUEUED, &tp->tsq_flags)) {
+ unsigned long flags;
+ struct tsq_tasklet *tsq;
+
+ /* Keep a ref on socket.
+ * This last ref will be released in tcp_tasklet_func()
+ */
+ atomic_sub(skb->truesize - 1, &sk->sk_wmem_alloc);
+
+ /* queue this socket to tasklet queue */
+ local_irq_save(flags);
+ tsq = &__get_cpu_var(tsq_tasklet);
+ list_add(&tp->tsq_node, &tsq->head);
+ tasklet_schedule(&tsq->tasklet);
+ local_irq_restore(flags);
+ } else {
+ sock_wfree(skb);
+ }
+}
+
/* This routine actually transmits TCP packets queued in by
* tcp_do_sendmsg(). This is used by both the initial
* transmission and possible later retransmissions.
@@ -844,7 +983,12 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
skb_push(skb, tcp_header_size);
skb_reset_transport_header(skb);
- skb_set_owner_w(skb, sk);
+
+ skb_orphan(skb);
+ skb->sk = sk;
+ skb->destructor = (sysctl_tcp_limit_output_bytes > 0) ?
+ tcp_wfree : sock_wfree;
+ atomic_add(skb->truesize, &sk->sk_wmem_alloc);
/* Build TCP header and checksum it. */
th = tcp_hdr(skb);
@@ -1780,6 +1924,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
while ((skb = tcp_send_head(sk))) {
unsigned int limit;
+
tso_segs = tcp_init_tso_segs(sk, skb, mss_now);
BUG_ON(!tso_segs);
@@ -1800,6 +1945,13 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
break;
}
+ /* TSQ : sk_wmem_alloc accounts skb truesize,
+ * including skb overhead. But thats OK.
+ */
+ if (atomic_read(&sk->sk_wmem_alloc) >= sysctl_tcp_limit_output_bytes) {
+ set_bit(TSQ_THROTTLED, &tp->tsq_flags);
+ break;
+ }
limit = mss_now;
if (tso_segs > 1 && !tcp_urg_mode(tp))
limit = tcp_mss_split_point(sk, skb, mss_now,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 61175cb..70458a9 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1970,6 +1970,7 @@ struct proto tcpv6_prot = {
.sendmsg = tcp_sendmsg,
.sendpage = tcp_sendpage,
.backlog_rcv = tcp_v6_do_rcv,
+ .release_cb = tcp_release_cb,
.hash = tcp_v6_hash,
.unhash = inet_unhash,
.get_port = inet_csk_get_port,
^ permalink raw reply related
* Re: [Ksummit-2012-discuss] Organising Mini Summits within the Kernel Summit
From: Stephen Hemminger @ 2012-07-11 15:44 UTC (permalink / raw)
To: James Bottomley; +Cc: ksummit-2012-discuss, netdev
In-Reply-To: <1341994155.3522.16.camel@dabdike.int.hansenpartnership.com>
On Wed, 11 Jul 2012 09:09:15 +0100
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> Hi All,
>
> We have set aside the second day of the kernel summit (Tuesday 28
> August) as mini-summit day. So far we have only the PCI mini summit on
> this day, so if you can think of other topics, please send them to the
> kernel summit discuss list:
>
> ksummit-2012-discuss@lists.linux-foundation.org
>
> Looking at the available rooms, we think we can run about four or five
> mini summits.
>
> As an added incentive, mini summit organisers get to pick who they
> invite and all the people they pick will get an automatic invitation to
> the third day of the kernel summit (but not the core first day) and the
> evening events.
>
> James
Is there enough interest to have a networking mini-summit?
^ permalink raw reply
* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Ben Greear @ 2012-07-11 15:43 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, ycheng, dave.taht, netdev, codel, therbert,
mattmathis, nanditad, ncardwell, andrewmcgr, Rick Jones
In-Reply-To: <1342020306.3265.8129.camel@edumazet-glaptop>
On 07/11/2012 08:25 AM, Eric Dumazet wrote:
> On Wed, 2012-07-11 at 08:16 -0700, Ben Greear wrote:
>
>> I haven't read your patch in detail, but I was wondering if this feature
>> would cause trouble for applications that are servicing many sockets at once
>> and so might take several ms between handling each individual socket.
>>
>
> Well, this patch has no impact for such applications. In fact their
> send()/write() will return to userland faster than before (for very
> large send())
Maybe I'm just confused. Is your patch just mucking with
the queues below the tcp xmit queues? From the patch description
I was thinking you were somehow directly limiting the TCP xmit
queues...
If you are just draining the tcp xmit queues on a new/faster
trigger, then I see no problem with that, and no need for
a per-socket control.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* [patch net-next 3/3] team: make team_port_enabled() and team_port_txable() static inline
From: Jiri Pirko @ 2012-07-11 15:34 UTC (permalink / raw)
To: netdev; +Cc: davem
In-Reply-To: <1342020844-3547-1-git-send-email-jpirko@redhat.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/team/team.c | 12 ------------
include/linux/if_team.h | 11 +++++++++--
2 files changed, 9 insertions(+), 14 deletions(-)
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index bc7afa5..3620c63 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -671,18 +671,6 @@ static bool team_port_find(const struct team *team,
return false;
}
-bool team_port_enabled(struct team_port *port)
-{
- return port->index != -1;
-}
-EXPORT_SYMBOL(team_port_enabled);
-
-bool team_port_txable(struct team_port *port)
-{
- return port->linkup && team_port_enabled(port);
-}
-EXPORT_SYMBOL(team_port_txable);
-
/*
* Enable/disable port by adding to enabled port hashlist and setting
* port->index (Might be racy so reader could see incorrect ifindex when
diff --git a/include/linux/if_team.h b/include/linux/if_team.h
index dca426c..dfa0c8e 100644
--- a/include/linux/if_team.h
+++ b/include/linux/if_team.h
@@ -63,8 +63,15 @@ struct team_port {
long mode_priv[0];
};
-extern bool team_port_enabled(struct team_port *port);
-extern bool team_port_txable(struct team_port *port);
+static inline bool team_port_enabled(struct team_port *port)
+{
+ return port->index != -1;
+}
+
+static inline bool team_port_txable(struct team_port *port)
+{
+ return port->linkup && team_port_enabled(port);
+}
struct team_mode_ops {
int (*init)(struct team *team);
--
1.7.10.4
^ permalink raw reply related
* [patch net-next 2/3] team: add broadcast mode
From: Jiri Pirko @ 2012-07-11 15:34 UTC (permalink / raw)
To: netdev; +Cc: davem
In-Reply-To: <1342020844-3547-1-git-send-email-jpirko@redhat.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/team/Kconfig | 13 ++++-
drivers/net/team/Makefile | 1 +
drivers/net/team/team_mode_broadcast.c | 88 ++++++++++++++++++++++++++++++++
3 files changed, 101 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/team/team_mode_broadcast.c
diff --git a/drivers/net/team/Kconfig b/drivers/net/team/Kconfig
index 89024d5..6a7260b 100644
--- a/drivers/net/team/Kconfig
+++ b/drivers/net/team/Kconfig
@@ -15,6 +15,17 @@ menuconfig NET_TEAM
if NET_TEAM
+config NET_TEAM_MODE_BROADCAST
+ tristate "Broadcast mode support"
+ depends on NET_TEAM
+ ---help---
+ Basic mode where packets are transmitted always by all suitable ports.
+
+ All added ports are setup to have team's mac address.
+
+ To compile this team mode as a module, choose M here: the module
+ will be called team_mode_broadcast.
+
config NET_TEAM_MODE_ROUNDROBIN
tristate "Round-robin mode support"
depends on NET_TEAM
@@ -22,7 +33,7 @@ config NET_TEAM_MODE_ROUNDROBIN
Basic mode where port used for transmitting packets is selected in
round-robin fashion using packet counter.
- All added ports are setup to have bond's mac address.
+ All added ports are setup to have team's mac address.
To compile this team mode as a module, choose M here: the module
will be called team_mode_roundrobin.
diff --git a/drivers/net/team/Makefile b/drivers/net/team/Makefile
index fb9f4c1..9757630 100644
--- a/drivers/net/team/Makefile
+++ b/drivers/net/team/Makefile
@@ -3,6 +3,7 @@
#
obj-$(CONFIG_NET_TEAM) += team.o
+obj-$(CONFIG_NET_TEAM_MODE_BROADCAST) += team_mode_broadcast.o
obj-$(CONFIG_NET_TEAM_MODE_ROUNDROBIN) += team_mode_roundrobin.o
obj-$(CONFIG_NET_TEAM_MODE_ACTIVEBACKUP) += team_mode_activebackup.o
obj-$(CONFIG_NET_TEAM_MODE_LOADBALANCE) += team_mode_loadbalance.o
diff --git a/drivers/net/team/team_mode_broadcast.c b/drivers/net/team/team_mode_broadcast.c
new file mode 100644
index 0000000..5562345
--- /dev/null
+++ b/drivers/net/team/team_mode_broadcast.c
@@ -0,0 +1,88 @@
+/*
+ * drivers/net/team/team_mode_broadcast.c - Broadcast mode for team
+ * Copyright (c) 2012 Jiri Pirko <jpirko@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/netdevice.h>
+#include <linux/if_team.h>
+
+static bool bc_transmit(struct team *team, struct sk_buff *skb)
+{
+ struct team_port *cur;
+ struct team_port *last = NULL;
+ struct sk_buff *skb2;
+ bool ret;
+ bool sum_ret = false;
+
+ list_for_each_entry_rcu(cur, &team->port_list, list) {
+ if (team_port_txable(cur)) {
+ if (last) {
+ skb2 = skb_clone(skb, GFP_ATOMIC);
+ if (skb2) {
+ skb2->dev = last->dev;
+ ret = dev_queue_xmit(skb2);
+ if (!sum_ret)
+ sum_ret = ret;
+ }
+ }
+ last = cur;
+ }
+ }
+ if (last) {
+ skb->dev = last->dev;
+ ret = dev_queue_xmit(skb);
+ if (!sum_ret)
+ sum_ret = ret;
+ }
+ return sum_ret;
+}
+
+static int bc_port_enter(struct team *team, struct team_port *port)
+{
+ return team_port_set_team_mac(port);
+}
+
+static void bc_port_change_mac(struct team *team, struct team_port *port)
+{
+ team_port_set_team_mac(port);
+}
+
+static const struct team_mode_ops bc_mode_ops = {
+ .transmit = bc_transmit,
+ .port_enter = bc_port_enter,
+ .port_change_mac = bc_port_change_mac,
+};
+
+static const struct team_mode bc_mode = {
+ .kind = "broadcast",
+ .owner = THIS_MODULE,
+ .ops = &bc_mode_ops,
+};
+
+static int __init bc_init_module(void)
+{
+ return team_mode_register(&bc_mode);
+}
+
+static void __exit bc_cleanup_module(void)
+{
+ team_mode_unregister(&bc_mode);
+}
+
+module_init(bc_init_module);
+module_exit(bc_cleanup_module);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jiri Pirko <jpirko@redhat.com>");
+MODULE_DESCRIPTION("Broadcast mode for team");
+MODULE_ALIAS("team-mode-broadcast");
--
1.7.10.4
^ permalink raw reply related
* [patch net-next 1/3] team: use function team_port_txable() for determing enabled and up port
From: Jiri Pirko @ 2012-07-11 15:34 UTC (permalink / raw)
To: netdev; +Cc: davem
In-Reply-To: <1342020844-3547-1-git-send-email-jpirko@redhat.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
drivers/net/team/team.c | 6 ++++++
drivers/net/team/team_mode_roundrobin.c | 6 +++---
include/linux/if_team.h | 1 +
3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 9b94f53..bc7afa5 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -677,6 +677,12 @@ bool team_port_enabled(struct team_port *port)
}
EXPORT_SYMBOL(team_port_enabled);
+bool team_port_txable(struct team_port *port)
+{
+ return port->linkup && team_port_enabled(port);
+}
+EXPORT_SYMBOL(team_port_txable);
+
/*
* Enable/disable port by adding to enabled port hashlist and setting
* port->index (Might be racy so reader could see incorrect ifindex when
diff --git a/drivers/net/team/team_mode_roundrobin.c b/drivers/net/team/team_mode_roundrobin.c
index 52dd0ec..0cf38e9 100644
--- a/drivers/net/team/team_mode_roundrobin.c
+++ b/drivers/net/team/team_mode_roundrobin.c
@@ -30,16 +30,16 @@ static struct team_port *__get_first_port_up(struct team *team,
{
struct team_port *cur;
- if (port->linkup)
+ if (team_port_txable(port))
return port;
cur = port;
list_for_each_entry_continue_rcu(cur, &team->port_list, list)
- if (cur->linkup)
+ if (team_port_txable(port))
return cur;
list_for_each_entry_rcu(cur, &team->port_list, list) {
if (cur == port)
break;
- if (cur->linkup)
+ if (team_port_txable(port))
return cur;
}
return NULL;
diff --git a/include/linux/if_team.h b/include/linux/if_team.h
index 99efd60..dca426c 100644
--- a/include/linux/if_team.h
+++ b/include/linux/if_team.h
@@ -64,6 +64,7 @@ struct team_port {
};
extern bool team_port_enabled(struct team_port *port);
+extern bool team_port_txable(struct team_port *port);
struct team_mode_ops {
int (*init)(struct team *team);
--
1.7.10.4
^ permalink raw reply related
* [patch net-next 0/3] team: couple of patches
From: Jiri Pirko @ 2012-07-11 15:34 UTC (permalink / raw)
To: netdev; +Cc: davem
Jiri Pirko (3):
team: use function team_port_txable() for determing enabled and up
port
team: add broadcast mode
team: make team_port_enabled() and team_port_txable() static inline
drivers/net/team/Kconfig | 13 ++++-
drivers/net/team/Makefile | 1 +
drivers/net/team/team.c | 6 ---
drivers/net/team/team_mode_broadcast.c | 88 +++++++++++++++++++++++++++++++
drivers/net/team/team_mode_roundrobin.c | 6 +--
include/linux/if_team.h | 10 +++-
6 files changed, 113 insertions(+), 11 deletions(-)
create mode 100644 drivers/net/team/team_mode_broadcast.c
--
1.7.10.4
^ permalink raw reply
* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-11 15:25 UTC (permalink / raw)
To: Ben Greear; +Cc: nanditad, netdev, mattmathis, codel, ncardwell, David Miller
In-Reply-To: <4FFD98EA.1040301@candelatech.com>
On Wed, 2012-07-11 at 08:16 -0700, Ben Greear wrote:
> I haven't read your patch in detail, but I was wondering if this feature
> would cause trouble for applications that are servicing many sockets at once
> and so might take several ms between handling each individual socket.
>
Well, this patch has no impact for such applications. In fact their
send()/write() will return to userland faster than before (for very
large send())
> Or, applications that for other reasons cannot service sockets quite
> as fast. Without this feature, they could poke more data into the
> xmit queues to be handled by the kernel while the app goes about it's
> other user-space work?
>
There is no impact for the applications. They queue their data in socket
write queue, and tcp stack do the work to actually transmit data
and handle ACKS.
Before this patch, this work was triggered by :
- Timers
- Incoming ACKS
We now add a third trigger : TX completion
> Maybe this feature could be enabled/tuned on a per-socket basis?
Well, why not, but I want first to see why it would be needed.
I mean, if a single application _needs_ to send MBytes of tcp data in
Qdisc at once, everything else on the machine is stuck (as today)
So just increase global param.
^ permalink raw reply
* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Ben Greear @ 2012-07-11 15:16 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, ycheng, dave.taht, netdev, codel, therbert,
mattmathis, nanditad, ncardwell, andrewmcgr, Rick Jones
In-Reply-To: <1342019518.3265.8116.camel@edumazet-glaptop>
On 07/11/2012 08:11 AM, Eric Dumazet wrote:
> On Tue, 2012-07-10 at 17:13 +0200, Eric Dumazet wrote:
>> This introduce TSQ (TCP Small Queues)
>>
>> TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
>> device queues), to reduce RTT and cwnd bias, part of the bufferbloat
>> problem.
>>
>> sk->sk_wmem_alloc not allowed to grow above a given limit,
>> allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
>> given time.
>>
>> TSO packets are sized/capped to half the limit, so that we have two
>> TSO packets in flight, allowing better bandwidth use.
>>
>> As a side effect, setting the limit to 40000 automatically reduces the
>> standard gso max limit (65536) to 40000/2 : It can help to reduce
>> latencies of high prio packets, having smaller TSO packets.
>>
>> This means we divert sock_wfree() to a tcp_wfree() handler, to
>> queue/send following frames when skb_orphan() [2] is called for the
>> already queued skbs.
>>
>> Results on my dev machine (tg3 nic) are really impressive, using
>> standard pfifo_fast, and with or without TSO/GSO. Without reduction of
>> nominal bandwidth.
>>
>> I no longer have 3MBytes backlogged in qdisc by a single netperf
>> session, and both side socket autotuning no longer use 4 Mbytes.
>>
>> As skb destructor cannot restart xmit itself ( as qdisc lock might be
>> taken at this point ), we delegate the work to a tasklet. We use one
>> tasklest per cpu for performance reasons.
>>
>>
>>
>> [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
>> [2] skb_orphan() is usually called at TX completion time,
>> but some drivers call it in their start_xmit() handler.
>> These drivers should at least use BQL, or else a single TCP
>> session can still fill the whole NIC TX ring, since TSQ will
>> have no effect.
>
> I am going to send an official patch (I'll put a v3 tag in it)
>
> I believe I did a full implementation, including the xmit() done
> by the user at release_sock() time, if the tasklet found socket owned by
> the user.
>
> Some bench results about the choice of 128KB being the default value:
>
> 64KB seems the 'good' value on 10Gb links to reach max throughput on my
> lab machines (ixgbe adapters).
>
> Using 128KB is a very conservative value to allow link rate on 20Gbps.
>
> Still, it allows less than 1ms of buffering on a Gbit link, and less
> than 8ms on 100Mbit link (instead of 130ms without Small Queues)
I haven't read your patch in detail, but I was wondering if this feature
would cause trouble for applications that are servicing many sockets at once
and so might take several ms between handling each individual socket.
Or, applications that for other reasons cannot service sockets quite
as fast. Without this feature, they could poke more data into the
xmit queues to be handled by the kernel while the app goes about it's
other user-space work?
Maybe this feature could be enabled/tuned on a per-socket basis?
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-11 15:11 UTC (permalink / raw)
To: David Miller; +Cc: nanditad, netdev, codel, mattmathis, ncardwell
In-Reply-To: <1341933215.3265.5476.camel@edumazet-glaptop>
On Tue, 2012-07-10 at 17:13 +0200, Eric Dumazet wrote:
> This introduce TSQ (TCP Small Queues)
>
> TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
> device queues), to reduce RTT and cwnd bias, part of the bufferbloat
> problem.
>
> sk->sk_wmem_alloc not allowed to grow above a given limit,
> allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
> given time.
>
> TSO packets are sized/capped to half the limit, so that we have two
> TSO packets in flight, allowing better bandwidth use.
>
> As a side effect, setting the limit to 40000 automatically reduces the
> standard gso max limit (65536) to 40000/2 : It can help to reduce
> latencies of high prio packets, having smaller TSO packets.
>
> This means we divert sock_wfree() to a tcp_wfree() handler, to
> queue/send following frames when skb_orphan() [2] is called for the
> already queued skbs.
>
> Results on my dev machine (tg3 nic) are really impressive, using
> standard pfifo_fast, and with or without TSO/GSO. Without reduction of
> nominal bandwidth.
>
> I no longer have 3MBytes backlogged in qdisc by a single netperf
> session, and both side socket autotuning no longer use 4 Mbytes.
>
> As skb destructor cannot restart xmit itself ( as qdisc lock might be
> taken at this point ), we delegate the work to a tasklet. We use one
> tasklest per cpu for performance reasons.
>
>
>
> [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
> [2] skb_orphan() is usually called at TX completion time,
> but some drivers call it in their start_xmit() handler.
> These drivers should at least use BQL, or else a single TCP
> session can still fill the whole NIC TX ring, since TSQ will
> have no effect.
I am going to send an official patch (I'll put a v3 tag in it)
I believe I did a full implementation, including the xmit() done
by the user at release_sock() time, if the tasklet found socket owned by
the user.
Some bench results about the choice of 128KB being the default value:
64KB seems the 'good' value on 10Gb links to reach max throughput on my
lab machines (ixgbe adapters).
Using 128KB is a very conservative value to allow link rate on 20Gbps.
Still, it allows less than 1ms of buffering on a Gbit link, and less
than 8ms on 100Mbit link (instead of 130ms without Small Queues)
Tests using a single TCP flow.
Tests on 10Gbit links :
echo 16384 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 16 tpci_snd_cwnd 79
tcpi_reordering 53 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
392360 392360 16384 20.00 1389.53 10^6bits/s 0.52 S 4.30 S 0.737 1.014 usec/KB
echo 24576 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 33 tpci_snd_cwnd 86
tcpi_reordering 53 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
396976 396976 16384 20.00 1483.03 10^6bits/s 0.45 S 4.51 S 0.603 0.997 usec/KB
echo 32768 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 19 tpci_snd_cwnd 100
tcpi_reordering 53 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
461600 461600 16384 20.00 2039.67 10^6bits/s 0.64 S 5.17 S 0.620 0.830 usec/KB
echo 49152 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 28 tpci_snd_cwnd 207
tcpi_reordering 53 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
955512 955512 16384 20.00 4448.86 10^6bits/s 1.19 S 11.16 S 0.526 0.822 usec/KB
echo 65536 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 399 tpci_snd_cwnd 488
tcpi_reordering 127 tcpi_total_retrans 75
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
2460328 2460328 16384 20.00 5975.12 10^6bits/s 1.81 S 14.65 S 0.595 0.803 usec/KB
echo 81920 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 24 tpci_snd_cwnd 236
tcpi_reordering 53 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1144768 1144768 16384 20.00 5190.08 10^6bits/s 1.56 S 12.63 S 0.591 0.798 usec/KB
echo 98304 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 20 tpci_snd_cwnd 644
tcpi_reordering 59 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
2991168 2991168 16384 20.00 5976.00 10^6bits/s 1.60 S 14.61 S 0.526 0.801 usec/KB
echo 114688 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 23 tpci_snd_cwnd 683
tcpi_reordering 59 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
3161960 3161960 16384 20.00 5975.14 10^6bits/s 1.42 S 14.78 S 0.469 0.810 usec/KB
echo 131072 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 23 tpci_snd_cwnd 591
tcpi_reordering 53 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
2728056 2728056 16384 20.00 5976.16 10^6bits/s 1.71 S 14.62 S 0.562 0.802 usec/KB
echo 147456 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 27 tpci_snd_cwnd 697
tcpi_reordering 64 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
3240432 3240432 16384 20.00 5975.64 10^6bits/s 1.51 S 14.78 S 0.498 0.811 usec/KB
echo 163840 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 18 tpci_snd_cwnd 710
tcpi_reordering 53 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
3277360 3277360 16384 20.00 5975.56 10^6bits/s 1.59 S 14.79 S 0.525 0.811 usec/KB
echo 180224 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 32 tpci_snd_cwnd 701
tcpi_reordering 53 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
3235816 3235816 16384 20.00 5976.80 10^6bits/s 1.56 S 14.61 S 0.514 0.801 usec/KB
echo 196608 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 502 tpci_snd_cwnd 690
tcpi_reordering 127 tcpi_total_retrans 37
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
3185040 3185040 16384 20.00 5975.46 10^6bits/s 1.50 S 14.67 S 0.493 0.804 usec/KB
echo 262144 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 16 tpci_snd_cwnd 721
tcpi_reordering 53 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
3448152 3448152 16384 20.00 5975.49 10^6bits/s 1.57 S 14.78 S 0.516 0.811 usec/KB
echo 524288 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 202000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 2000 tcpi_rttvar 750 tcpi_snd_ssthresh 16 tpci_snd_cwnd 927
tcpi_reordering 53 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
4194304 4194304 16384 20.01 5976.61 10^6bits/s 1.63 S 14.56 S 0.538 0.798 usec/KB
echo 1048576 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 202000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 2500 tcpi_rttvar 750 tcpi_snd_ssthresh 17 tpci_snd_cwnd 1272
tcpi_reordering 90 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
4194304 4194304 16384 20.01 5975.11 10^6bits/s 1.64 S 14.69 S 0.541 0.805 usec/KB
Tests on Gbit link :
echo 16384 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 30 tpci_snd_cwnd 274
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1264784 1264784 16384 20.01 689.70 10^6bits/s 0.22 S 15.05 S 0.634 7.149 usec/KB
echo 24576 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 43 tpci_snd_cwnd 245
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1130920 1130920 16384 20.01 860.21 10^6bits/s 0.25 S 16.05 S 0.576 6.112 usec/KB
echo 32768 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 36 tpci_snd_cwnd 229
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1057064 1057064 16384 20.01 867.76 10^6bits/s 0.28 S 15.46 S 0.634 5.839 usec/KB
echo 49152 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 32 tpci_snd_cwnd 293
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1352488 1352488 16384 20.01 873.61 10^6bits/s 0.21 S 16.25 S 0.483 6.095 usec/KB
echo 65536 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 48 tpci_snd_cwnd 274
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1264784 1264784 16384 20.01 875.90 10^6bits/s 0.19 S 15.56 S 0.421 5.822 usec/KB
echo 81920 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 18 tpci_snd_cwnd 246
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1135536 1135536 16384 20.01 879.10 10^6bits/s 0.26 S 15.92 S 0.590 5.935 usec/KB
echo 98304 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 20 tpci_snd_cwnd 361
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1666376 1666376 16384 20.02 880.30 10^6bits/s 0.25 S 16.07 S 0.560 5.980 usec/KB
echo 114688 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 41 tpci_snd_cwnd 281
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1297096 1297096 16384 20.01 881.30 10^6bits/s 0.26 S 15.96 S 0.569 5.933 usec/KB
echo 131072 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 30 tpci_snd_cwnd 292
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1347872 1347872 16384 20.01 880.43 10^6bits/s 0.23 S 16.71 S 0.511 6.219 usec/KB
echo 147456 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 202000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 2000 tcpi_rttvar 750 tcpi_snd_ssthresh 31 tpci_snd_cwnd 286
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1320176 1320176 16384 20.01 880.57 10^6bits/s 0.24 S 16.62 S 0.534 6.187 usec/KB
echo 163840 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 19 tpci_snd_cwnd 406
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1874096 1874096 16384 20.02 880.23 10^6bits/s 0.25 S 17.08 S 0.550 6.358 usec/KB
echo 180224 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 202000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 2000 tcpi_rttvar 750 tcpi_snd_ssthresh 27 tpci_snd_cwnd 304
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1403264 1403264 16384 20.01 880.34 10^6bits/s 0.22 S 16.03 S 0.501 5.965 usec/KB
echo 196608 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 202000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 2000 tcpi_rttvar 750 tcpi_snd_ssthresh 42 tpci_snd_cwnd 365
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1684840 1684840 16384 20.02 879.73 10^6bits/s 0.26 S 16.82 S 0.578 6.267 usec/KB
echo 262144 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 202000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 2875 tcpi_rttvar 750 tcpi_snd_ssthresh 27 tpci_snd_cwnd 471
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
2174136 2174136 16384 20.01 879.89 10^6bits/s 0.25 S 18.52 S 0.556 6.898 usec/KB
echo 524288 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 205000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 5000 tcpi_rttvar 750 tcpi_snd_ssthresh 42 tpci_snd_cwnd 627
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
2894232 2894232 16384 20.03 879.84 10^6bits/s 0.25 S 17.12 S 0.564 6.374 usec/KB
echo 1048576 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 209000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 9875 tcpi_rttvar 750 tcpi_snd_ssthresh 33 tpci_snd_cwnd 950
tcpi_reordering 3 tcpi_total_retrans 0
Local Local Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Send Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
4194304 4194304 16384 20.03 880.70 10^6bits/s 0.25 S 18.44 S 0.560 6.861 usec/KB
^ permalink raw reply
* Re: [PATCH 4/4] asix: Add a new driver for the AX88172A
From: Christian Riesch @ 2012-07-11 15:10 UTC (permalink / raw)
To: michael
Cc: Ben Hutchings, netdev, Oliver Neukum, Eric Dumazet, Allan Chou,
Mark Lord, Grant Grundler, Ming Lei
In-Reply-To: <CABkLObo5v00QKo-X7hEVbMcXA_QwKFA6HfL-Le5VvU2J5Cs2eg@mail.gmail.com>
Hi again,
On Wed, Jul 11, 2012 at 10:27 AM, Christian Riesch
<christian.riesch@omicron.at> wrote:
> Hi Ben and Michael,
>
> On Mon, Jul 9, 2012 at 12:30 PM, Christian Riesch
> <christian.riesch@omicron.at> wrote:
>> Hi Ben and Michael,
>>
>> On Sun, Jul 8, 2012 at 5:39 PM, Michael Riesch <michael@riesch.at> wrote:
>>> On Fri, 2012-07-06 at 18:37 +0100, Ben Hutchings wrote:
>>>> > + priv->mdio->priv = (void *)dev;
>>>> > + priv->mdio->read = &asix_mdio_bus_read;
>>>> > + priv->mdio->write = &asix_mdio_bus_write;
>>>> > + priv->mdio->name = "Asix MDIO Bus";
>>>> > + snprintf(priv->mdio->id, MII_BUS_ID_SIZE, "asix-%s",
>>>> > + dev_name(dev->net->dev.parent));
>>>> [...]
>>>>
>>>> I think you need to ensure that the bus identifier is unique throughout
>>>> its lifetime, but net devices can be renamed and that could lead to a
>>>> collision. Perhaps you could use the ifindex or the USB device path
>>>
>>> Ben,
>>>
>>> the dev_name function in the code above returns the sysfs filename of
>>> the USB device (e.g. 1-0:1.0).
>>>
>>>> (though that might be too long).
>>>
>>> This may be a problem. The bus identifier may be 17 characters long, so
>>> if we leave the endpoint/configuration part (:1.0) and the prefix away
>>> it should be fine in any "normal" system. However, on a system with a
>>> more-than-9-root-hubs 5-tier 127-devices-each USB infrastructure it
>>> results in collisions. So is this approach acceptable?
>>>
>>> Using the ifindex sounds good to me,
>>>
>>> snprintf(priv->mdio->id, MII_BUS_ID_SIZE, "asix-%d",
>>> dev->net->ifindex);
>>>
>>> works on any system with less than 10^12 network interfaces.
>>
>> Ok, I'll change that to use ifindex.
>
> No, I won't.
> At the time the mdio bus is registered, ifindex is not yet set, so the
> snprintf would always result in "asix-0".
What do you think about
snprintf(priv->mdio->id, MII_BUS_ID_SIZE, "usb-%03d:%03d",
dev->udev->bus->busnum, dev->udev->devnum);
??
This would use the busnum/devnum identifier as reported by lsusb and
would be short enough for an mdio bus name.
Thanks, Christian
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox