* [PATCH net-next 00/11] FUJITSU Extended Socket driver version 1.1
From: Taku Izumi @ 2016-04-11 8:08 UTC (permalink / raw)
To: davem, netdev; +Cc: Taku Izumi
This patchsets update FUJITSU Extended Socket network driver into version 1.1.
This mainly includes debugging feature and some minor bugfix.
Taku Izumi (11):
fjes: Add trace-gathering facility
fjes: Add setting/getting register value feature via ioctl
fjes: Add debugs facility for fjes module
fjes: Add debugfs entry for statistics
fjes: show EP stats at statistics file in debugfs
fjes: optimize timeout value
fjes: fix incorrect statistics information in fjes_xmit_frame()
fjes: fix bitwise check bug in fjes_raise_intr_rxdata_task
fjes: Enhance changing MTU related work
fjes: Introduce spinlock for rx_status
fjes: Update fjes driver version : 1.1
drivers/net/fjes/Makefile | 2 +-
drivers/net/fjes/fjes.h | 16 ++
drivers/net/fjes/fjes_debugfs.c | 212 +++++++++++++++++++++
drivers/net/fjes/fjes_hw.c | 179 +++++++++++++++++-
drivers/net/fjes/fjes_hw.h | 43 ++++-
drivers/net/fjes/fjes_ioctl.h | 93 ++++++++++
drivers/net/fjes/fjes_main.c | 400 +++++++++++++++++++++++++++++++++++++---
7 files changed, 909 insertions(+), 36 deletions(-)
create mode 100644 drivers/net/fjes/fjes_debugfs.c
create mode 100644 drivers/net/fjes/fjes_ioctl.h
--
2.4.3
^ permalink raw reply
* [PATCH net-next 4/4] bnxt_en: Add async event handling for speed config changes.
From: Michael Chan @ 2016-04-11 8:11 UTC (permalink / raw)
To: davem; +Cc: netdev
In-Reply-To: <1460362274-21771-1-git-send-email-michael.chan@broadcom.com>
On some dual port cards, link speeds on both ports have to be compatible.
Firmware will inform the driver when a certain speed is no longer
supported if the other port has linked up at a certain speed. Add
logic to handle this event by logging a message and getting the
updated list of supported speeds.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index c83a5a1..4645c44 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -122,6 +122,7 @@ static const u16 bnxt_async_events_arr[] = {
HWRM_ASYNC_EVENT_CMPL_EVENT_ID_LINK_STATUS_CHANGE,
HWRM_ASYNC_EVENT_CMPL_EVENT_ID_PF_DRVR_UNLOAD,
HWRM_ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED,
+ HWRM_ASYNC_EVENT_CMPL_EVENT_ID_LINK_SPEED_CFG_CHANGE,
};
static bool bnxt_vf_pciid(enum board_idx idx)
@@ -1257,6 +1258,21 @@ static int bnxt_async_event_process(struct bnxt *bp,
/* TODO CHIMP_FW: Define event id's for link change, error etc */
switch (event_id) {
+ case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_LINK_SPEED_CFG_CHANGE: {
+ u32 data1 = le32_to_cpu(cmpl->event_data1);
+ struct bnxt_link_info *link_info = &bp->link_info;
+
+ if (BNXT_VF(bp))
+ goto async_event_process_exit;
+ if (data1 & 0x20000) {
+ u16 fw_speed = link_info->force_link_speed;
+ u32 speed = bnxt_fw_to_ethtool_speed(fw_speed);
+
+ netdev_warn(bp->dev, "Link speed %d no longer supported\n",
+ speed);
+ }
+ /* fall thru */
+ }
case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_LINK_STATUS_CHANGE:
set_bit(BNXT_LINK_CHNG_SP_EVENT, &bp->sp_event);
break;
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 3/4] bnxt_en: Call firmware to approve VF MAC address change.
From: Michael Chan @ 2016-04-11 8:11 UTC (permalink / raw)
To: davem; +Cc: netdev
In-Reply-To: <1460362274-21771-1-git-send-email-michael.chan@broadcom.com>
Some hypervisors (e.g. ESX) require the VF MAC address to be forwarded to
the PF for approval. In Linux PF, the call is not forwarded and the
firmware will simply check and approve the MAC address if the PF has not
previously administered a valid MAC address for this VF.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 7 +++---
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 30 +++++++++++++++++++++++++
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h | 1 +
3 files changed, 34 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index e874a56..c83a5a1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5696,10 +5696,9 @@ static int bnxt_change_mac_addr(struct net_device *dev, void *p)
if (!is_valid_ether_addr(addr->sa_data))
return -EADDRNOTAVAIL;
-#ifdef CONFIG_BNXT_SRIOV
- if (BNXT_VF(bp) && is_valid_ether_addr(bp->vf.mac_addr))
- return -EADDRNOTAVAIL;
-#endif
+ rc = bnxt_approve_mac(bp, addr->sa_data);
+ if (rc)
+ return rc;
if (ether_addr_equal(addr->sa_data, dev->dev_addr))
return 0;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 8457850..363884d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -865,6 +865,31 @@ update_vf_mac_exit:
mutex_unlock(&bp->hwrm_cmd_lock);
}
+int bnxt_approve_mac(struct bnxt *bp, u8 *mac)
+{
+ struct hwrm_func_vf_cfg_input req = {0};
+ int rc = 0;
+
+ if (!BNXT_VF(bp))
+ return 0;
+
+ if (bp->hwrm_spec_code < 0x10202) {
+ if (is_valid_ether_addr(bp->vf.mac_addr))
+ rc = -EADDRNOTAVAIL;
+ goto mac_done;
+ }
+ bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_VF_CFG, -1, -1);
+ req.enables = cpu_to_le32(FUNC_VF_CFG_REQ_ENABLES_DFLT_MAC_ADDR);
+ memcpy(req.dflt_mac_addr, mac, ETH_ALEN);
+ rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+mac_done:
+ if (rc) {
+ rc = -EADDRNOTAVAIL;
+ netdev_warn(bp->dev, "VF MAC address %pM not approved by the PF\n",
+ mac);
+ }
+ return rc;
+}
#else
void bnxt_sriov_disable(struct bnxt *bp)
@@ -879,4 +904,9 @@ void bnxt_hwrm_exec_fwd_req(struct bnxt *bp)
void bnxt_update_vf_mac(struct bnxt *bp)
{
}
+
+int bnxt_approve_mac(struct bnxt *bp, u8 *mac)
+{
+ return 0;
+}
#endif
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
index 3f08354..0392670 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
@@ -20,4 +20,5 @@ int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs);
void bnxt_sriov_disable(struct bnxt *);
void bnxt_hwrm_exec_fwd_req(struct bnxt *);
void bnxt_update_vf_mac(struct bnxt *);
+int bnxt_approve_mac(struct bnxt *, u8 *);
#endif
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 0/4] bnxt_en: Update for net-next
From: Michael Chan @ 2016-04-11 8:11 UTC (permalink / raw)
To: davem; +Cc: netdev
Misc. changes for link speed and VF MAC address change.
Michael Chan (4):
bnxt_en: Disallow forced speed for 10GBaseT devices.
bnxt_en: Shutdown link when device is closed.
bnxt_en: Call firmware to approve VF MAC address change.
bnxt_en: Add async event handling for speed config changes.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 40 ++++++++++++++++++++---
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 8 +++++
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 30 +++++++++++++++++
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h | 1 +
5 files changed, 76 insertions(+), 4 deletions(-)
--
1.8.3.1
^ permalink raw reply
* [PATCH net-next 2/4] bnxt_en: Shutdown link when device is closed.
From: Michael Chan @ 2016-04-11 8:11 UTC (permalink / raw)
To: davem; +Cc: netdev
In-Reply-To: <1460362274-21771-1-git-send-email-michael.chan@broadcom.com>
Let firmware know that the driver is giving up control of the link so that
it can be shutdown if no management firmware is running.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index a06dcaa..e874a56 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4790,6 +4790,21 @@ int bnxt_hwrm_set_link_setting(struct bnxt *bp, bool set_pause, bool set_eee)
return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
}
+static int bnxt_hwrm_shutdown_link(struct bnxt *bp)
+{
+ struct hwrm_port_phy_cfg_input req = {0};
+
+ if (BNXT_VF(bp))
+ return 0;
+
+ if (pci_num_vf(bp->pdev))
+ return 0;
+
+ bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_PORT_PHY_CFG, -1, -1);
+ req.flags = cpu_to_le32(PORT_PHY_CFG_REQ_FLAGS_FORCE_LINK_DOWN);
+ return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+}
+
static bool bnxt_eee_config_ok(struct bnxt *bp)
{
struct ethtool_eee *eee = &bp->eee;
@@ -5044,6 +5059,7 @@ static int bnxt_close(struct net_device *dev)
struct bnxt *bp = netdev_priv(dev);
bnxt_close_nic(bp, true, true);
+ bnxt_hwrm_shutdown_link(bp);
return 0;
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 1/4] bnxt_en: Disallow forced speed for 10GBaseT devices.
From: Michael Chan @ 2016-04-11 8:11 UTC (permalink / raw)
To: davem; +Cc: netdev
In-Reply-To: <1460362274-21771-1-git-send-email-michael.chan@broadcom.com>
10GBaseT devices must autonegotiate to determine master/slave clocking.
Disallow forced speed in ethtool .set_settings() for these devices.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 8 ++++++++
3 files changed, 10 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 597e472..a06dcaa 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4611,6 +4611,7 @@ static int bnxt_update_link(struct bnxt *bp, bool chng_link_state)
link_info->phy_ver[1] = resp->phy_min;
link_info->phy_ver[2] = resp->phy_bld;
link_info->media_type = resp->media_type;
+ link_info->phy_type = resp->phy_type;
link_info->transceiver = resp->xcvr_pkg_type;
link_info->phy_addr = resp->eee_config_phy_addr &
PORT_PHY_QCFG_RESP_PHY_ADDR_MASK;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index cc8e38a..26dac2f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -759,6 +759,7 @@ struct bnxt_ntuple_filter {
};
struct bnxt_link_info {
+ u8 phy_type;
u8 media_type;
u8 transceiver;
u8 phy_addr;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index a2e9324..d6e41f2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -850,7 +850,15 @@ static int bnxt_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
set_pause = true;
} else {
u16 fw_speed;
+ u8 phy_type = link_info->phy_type;
+ if (phy_type == PORT_PHY_QCFG_RESP_PHY_TYPE_BASET ||
+ phy_type == PORT_PHY_QCFG_RESP_PHY_TYPE_BASETE ||
+ link_info->media_type == PORT_PHY_QCFG_RESP_MEDIA_TYPE_TP) {
+ netdev_err(dev, "10GBase-T devices must autoneg\n");
+ rc = -EINVAL;
+ goto set_setting_exit;
+ }
/* TODO: currently don't support half duplex */
if (cmd->duplex == DUPLEX_HALF) {
netdev_err(dev, "HALF DUPLEX is not supported!\n");
--
1.8.3.1
^ permalink raw reply related
* [PATCH net v2] net: sched: do not requeue a NULL skb
From: Lars Persson @ 2016-04-11 6:24 UTC (permalink / raw)
To: netdev; +Cc: jhs, linux-kernel, xiyou.wangcong, Lars Persson
A failure in validate_xmit_skb_list() triggered an unconditional call
to dev_requeue_skb with skb=NULL. This slowly grows the queue
discipline's qlen count until all traffic through the queue stops.
By introducing a NULL check in dev_requeue_skb it was also necessary
to make the __netif_schedule call conditional to avoid scheduling an
empty queue.
Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock")
Signed-off-by: Lars Persson <larper@axis.com>
---
net/sched/sch_generic.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f18c350..4e6a79c 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -47,10 +47,13 @@ EXPORT_SYMBOL(default_qdisc_ops);
static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
{
- q->gso_skb = skb;
- q->qstats.requeues++;
- q->q.qlen++; /* it's still part of the queue */
- __netif_schedule(q);
+ if (skb) {
+ q->gso_skb = skb;
+ q->qstats.requeues++;
+ q->q.qlen++; /* it's still part of the queue */
+ }
+ if (qdisc_qlen(q))
+ __netif_schedule(q);
return 0;
}
--
2.1.4
^ permalink raw reply related
* [PATCH net] cxgb4: Stop Rx Queues before freeing it up
From: Hariprasad Shenai @ 2016-04-11 5:37 UTC (permalink / raw)
To: davem; +Cc: netdev, leedom, nirranjan, Hariprasad Shenai
Stop all Ethernet RX Queues before freeing up various Ingress/Egress
Queues, etc. We were seeing cases of Ingress Queues not getting serviced
during the shutdown process leading to Ingress Paths jamming up through
the chip and blocking the shutdown effort itself.
One such case involved the Firmware sending a "Flush Token" through the
ULP-TX -> ULP-RX path for an Ethernet TX Queue being freed in order to
make sure there weren't any remaining TX Work Requests in the pipeline.
But the return path was stalled by Ingress Data unable to be delivered to
the Host because those Ingress Queues were no longer being serviced.
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 3 +++
drivers/net/ethernet/chelsio/cxgb4/sge.c | 20 +++++++++++++++---
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 33 ++++++++++++++++++++++++++++++
3 files changed, 53 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 984a3cc26f86..326d4009525e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -1451,6 +1451,9 @@ int t4_mdio_rd(struct adapter *adap, unsigned int mbox, unsigned int phy_addr,
unsigned int mmd, unsigned int reg, u16 *valp);
int t4_mdio_wr(struct adapter *adap, unsigned int mbox, unsigned int phy_addr,
unsigned int mmd, unsigned int reg, u16 val);
+int t4_iq_stop(struct adapter *adap, unsigned int mbox, unsigned int pf,
+ unsigned int vf, unsigned int iqtype, unsigned int iqid,
+ unsigned int fl0id, unsigned int fl1id);
int t4_iq_free(struct adapter *adap, unsigned int mbox, unsigned int pf,
unsigned int vf, unsigned int iqtype, unsigned int iqid,
unsigned int fl0id, unsigned int fl1id);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 13b144bcf725..6278e5a74b74 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2981,14 +2981,28 @@ void t4_free_ofld_rxqs(struct adapter *adap, int n, struct sge_ofld_rxq *q)
void t4_free_sge_resources(struct adapter *adap)
{
int i;
- struct sge_eth_rxq *eq = adap->sge.ethrxq;
- struct sge_eth_txq *etq = adap->sge.ethtxq;
+ struct sge_eth_rxq *eq;
+ struct sge_eth_txq *etq;
+
+ /* stop all Rx queues in order to start them draining */
+ for (i = 0; i < adap->sge.ethqsets; i++) {
+ eq = &adap->sge.ethrxq[i];
+ if (eq->rspq.desc)
+ t4_iq_stop(adap, adap->mbox, adap->pf, 0,
+ FW_IQ_TYPE_FL_INT_CAP,
+ eq->rspq.cntxt_id,
+ eq->fl.size ? eq->fl.cntxt_id : 0xffff,
+ 0xffff);
+ }
/* clean up Ethernet Tx/Rx queues */
- for (i = 0; i < adap->sge.ethqsets; i++, eq++, etq++) {
+ for (i = 0; i < adap->sge.ethqsets; i++) {
+ eq = &adap->sge.ethrxq[i];
if (eq->rspq.desc)
free_rspq_fl(adap, &eq->rspq,
eq->fl.size ? &eq->fl : NULL);
+
+ etq = &adap->sge.ethtxq[i];
if (etq->q.desc) {
t4_eth_eq_free(adap, adap->mbox, adap->pf, 0,
etq->q.cntxt_id);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index cc1736bece0f..520ffcaef6d8 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -6940,6 +6940,39 @@ int t4_identify_port(struct adapter *adap, unsigned int mbox, unsigned int viid,
}
/**
+ * t4_iq_stop - stop an ingress queue and its FLs
+ * @adap: the adapter
+ * @mbox: mailbox to use for the FW command
+ * @pf: the PF owning the queues
+ * @vf: the VF owning the queues
+ * @iqtype: the ingress queue type (FW_IQ_TYPE_FL_INT_CAP, etc.)
+ * @iqid: ingress queue id
+ * @fl0id: FL0 queue id or 0xffff if no attached FL0
+ * @fl1id: FL1 queue id or 0xffff if no attached FL1
+ *
+ * Stops an ingress queue and its associated FLs, if any. This causes
+ * any current or future data/messages destined for these queues to be
+ * tossed.
+ */
+int t4_iq_stop(struct adapter *adap, unsigned int mbox, unsigned int pf,
+ unsigned int vf, unsigned int iqtype, unsigned int iqid,
+ unsigned int fl0id, unsigned int fl1id)
+{
+ struct fw_iq_cmd c;
+
+ memset(&c, 0, sizeof(c));
+ c.op_to_vfn = cpu_to_be32(FW_CMD_OP_V(FW_IQ_CMD) | FW_CMD_REQUEST_F |
+ FW_CMD_EXEC_F | FW_IQ_CMD_PFN_V(pf) |
+ FW_IQ_CMD_VFN_V(vf));
+ c.alloc_to_len16 = cpu_to_be32(FW_IQ_CMD_IQSTOP_F | FW_LEN16(c));
+ c.type_to_iqandstindex = cpu_to_be32(FW_IQ_CMD_TYPE_V(iqtype));
+ c.iqid = cpu_to_be16(iqid);
+ c.fl0id = cpu_to_be16(fl0id);
+ c.fl1id = cpu_to_be16(fl1id);
+ return t4_wr_mbox(adap, mbox, &c, sizeof(c), NULL);
+}
+
+/**
* t4_iq_free - free an ingress queue and its FLs
* @adap: the adapter
* @mbox: mailbox to use for the FW command
--
2.3.4
^ permalink raw reply related
* Re: Backport Security Fix for CVE-2015-8787 to v4.1
From: Yuki Machida @ 2016-04-11 4:58 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: davem, netdev, kamatam
In-Reply-To: <570B2B4E.2080303@jp.fujitsu.com>
Hi Pablo,
On 2016年04月11日 13:42, Yuki Machida wrote:
> Hi Pablo,
>
> On 2016年04月07日 23:46, Pablo Neira Ayuso wrote:
>> On Thu, Apr 07, 2016 at 03:40:30PM +0900, Yuki Machida wrote:
>>> Hi David,
>>>
>>> I conformed that a patch of CVE-2015-8787 not applied at v4.1.21.
>>> Could you please apply a patch for 4.1-stable ?
>>>
>>> CVE-2015-8787
>>> Upstream commit 94f9cd81436c85d8c3a318ba92e236ede73752fc
>>
>> I'll request again, this time to Sasha Levin to include this in
>> -stable 4.1.
> Thank you for your help.
David said "Please send to the netfilter team".
Therefore, I will send above patch to netfilter team.
Thank you.
>> Thanks.
>>
>
^ permalink raw reply
* Re: Backport Security Fix for CVE-2015-8787 to v4.1
From: Yuki Machida @ 2016-04-11 4:42 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: davem, netdev, kamatam
In-Reply-To: <20160407144628.GA1137@salvia>
Hi Pablo,
On 2016年04月07日 23:46, Pablo Neira Ayuso wrote:
> On Thu, Apr 07, 2016 at 03:40:30PM +0900, Yuki Machida wrote:
>> Hi David,
>>
>> I conformed that a patch of CVE-2015-8787 not applied at v4.1.21.
>> Could you please apply a patch for 4.1-stable ?
>>
>> CVE-2015-8787
>> Upstream commit 94f9cd81436c85d8c3a318ba92e236ede73752fc
>
> I'll request again, this time to Sasha Levin to include this in
> -stable 4.1.
Thank you for your help.
> Thanks.
>
^ permalink raw reply
* Re: [PATCH 1/1] net: stmmac: socfgpa: Ensure emac bit set in System Manger for PTP
From: David Miller @ 2016-04-11 3:44 UTC (permalink / raw)
To: preid; +Cc: peppe.cavallaro, netdev
In-Reply-To: <1460015735-8946-2-git-send-email-preid@electromag.com.au>
From: Phil Reid <preid@electromag.com.au>
Date: Thu, 7 Apr 2016 15:55:35 +0800
> When using the PTP fpga to hps clock source for the stmmac module
> the appropriate bit in the System Manager FPGA Interface Group register
> needs to be set. This is not set by the bootloader setup when the
> HPS emac pins are being for this emac module.
>
> This allows the PTP clock to be sourced from the FPGA and also connects
> the PTP pps and ext trig signals to the stmmac PTP hardware.
>
> Patch proposed by Phil Collins.
>
> Signed-off-by: Phil Reid <preid@electromag.com.au>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] netlink: don't send NETLINK_URELEASE for unbound sockets
From: David Miller @ 2016-04-11 3:33 UTC (permalink / raw)
To: johannes-cdvu00un1VgdHxzADdlk8Q
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, dmitrijs.ivanovs-NO1NBkfNQUg,
linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1460014298-30293-1-git-send-email-johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
From: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
Date: Thu, 7 Apr 2016 09:31:38 +0200
> From: Dmitry Ivanov <dmitrijs.ivanovs-NO1NBkfNQUg@public.gmane.org>
>
> All existing users of NETLINK_URELEASE use it to clean up resources that
> were previously allocated to a socket via some command. As a result, no
> users require getting this notification for unbound sockets.
>
> Sending it for unbound sockets, however, is a problem because any user
> (including unprivileged users) can create a socket that uses the same ID
> as an existing socket. Binding this new socket will fail, but if the
> NETLINK_URELEASE notification is generated for such sockets, the users
> thereof will be tricked into thinking the socket that they allocated the
> resources for is closed.
>
> In the nl80211 case, this will cause destruction of virtual interfaces
> that still belong to an existing hostapd process; this is the case that
> Dmitry noticed. In the NFC case, it will cause a poll abort. In the case
> of netlink log/queue it will cause them to stop reporting events, as if
> NFULNL_CFG_CMD_UNBIND/NFQNL_CFG_CMD_UNBIND had been called.
>
> Fix this problem by checking that the socket is bound before generating
> the NETLINK_URELEASE notification.
>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: Dmitry Ivanov <dima-NO1NBkfNQUg@public.gmane.org>
> Signed-off-by: Johannes Berg <johannes.berg-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Applied and queued up for -stable, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH net-next WIP] ethtool: generic netlink policy
From: Roopa Prabhu @ 2016-04-11 3:15 UTC (permalink / raw)
To: netdev; +Cc: davem, jiri, eladr, idosch
From: Roopa Prabhu <roopa@cumulusnetworks.com>
netlink for ethtool came up at netconf/netdev and we had promised to
send some of the ethtool netlink code we have.
We use a generic netlink channel for ethtool between our kernel and
user space driver. This ethtool channel nicely wraps most ethtool
commands into genl messages. And is capable of handling delayed
remote ops to userspace in some cases (dropping rtnl etc). We use
this channel to also cache some of this ethtool data in the kernel.
In this patch I have included just the genl policy for ethtool which
will apply to the generic usecase. We can certainly share the rest of
it if we see a usecase. Especially the remote handling of ethtool ops
for delayed hw operations maybe useful in other cases (today they are
tied to our remote driver in userspace). The ethtool handlers for
genl use the existing ethtool structs and call into the
respective driver handlers.
This came up again at the switchdev discussion recently and I had
promised to get this out this weekend :). This patch does not include
changes to compile the code.
We should move ethtool to netlink at some point: And I think we
should also explore the possibility of including it into the existing
new devlink generic netlink infrastructure. And ethtool stats should
move to the new stats infrastructure.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com>
---
net/core/ethtool_netlink.c | 200 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 200 insertions(+)
create mode 100644 net/core/ethtool_netlink.c
diff --git a/net/core/ethtool_netlink.c b/net/core/ethtool_netlink.c
new file mode 100644
index 0000000..f5445f3
--- /dev/null
+++ b/net/core/ethtool_netlink.c
@@ -0,0 +1,200 @@
+/*
+ * net/core/ethtool_netlink.c - generic ethtool netlink handler
+ * Copyright (C) 2015 Cumulus Networks
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/capability.h>
+#include <linux/errno.h>
+#include <linux/ethtool.h>
+#include <linux/port.h>
+#include <linux/netdevice.h>
+#include <linux/list.h>
+#include <linux/rtnetlink.h>
+#include <linux/hashtable.h>
+#include <linux/rcupdate.h>
+#include <linux/nsproxy.h>
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
+#include <net/genetlink.h>
+
+static const struct nla_policy ethtool_policy[ETHTOOL_ATTR_MAX + 1] = {
+ [ETHTOOL_ATTR_IFINDEX] = { .type = NLA_U32 },
+ [ETHTOOL_ATTR_FLAGS] = { .type = NLA_U32 },
+ [ETHTOOL_ATTR_PHYS_ID_STATE] = { .type = NLA_U8 },
+ [ETHTOOL_ATTR_SETTINGS] = { .type = NLA_BINARY,
+ .len = sizeof(struct ethtool_cmd) },
+ [ETHTOOL_ATTR_PAUSE] = { .type = NLA_BINARY,
+ .len = sizeof(struct ethtool_pauseparam) },
+ [ETHTOOL_ATTR_MODINFO] = { .type = NLA_BINARY,
+ .len = sizeof(struct ethtool_modinfo) },
+ [ETHTOOL_ATTR_EEPROM] = { .type = NLA_BINARY,
+ .len = sizeof(struct ethtool_eeprom) },
+ [ETHTOOL_ATTR_EEPROM_DATA] = { .type = NLA_BINARY },
+ [ETHTOOL_ATTR_STATS] = { .type = NLA_NESTED },
+ [ETHTOOL_ATTR_STAT] = { .type = NLA_U32 },
+ [ETHTOOL_ATTR_STRINGS] = { .type = NLA_NESTED },
+ [ETHTOOL_ATTR_STRING] = { .type = NLA_STRING,
+ .len = ETH_GSTRING_LEN },
+ [ETHTOOL_ATTR_SSET] = { .type = NLA_U32 },
+ [ETHTOOL_ATTR_SSET_COUNT] = { .type = NLA_U32 },
+};
+
+static struct genl_family ethtool_family = {
+ .id = GENL_ID_GENERATE,
+ .name = "ethtool_family",
+ .version = 1,
+ .maxattr = ETHTOOL_ATTR_MAX,
+};
+
+static struct genl_multicast_group ethtool_mcgrp[] = {
+ { .name = "port_mc", },
+};
+
+static LIST_HEAD(wq_list);
+
+static struct genl_ops ethtool_ops[] = {
+ {
+ .cmd = ETHTOOL_CMD_GET_SETTINGS,
+ .policy = ethtool_policy,
+ .doit = ethtool_get_settings,
+ },
+ {
+ .cmd = ETHTOOL_CMD_SET_SETTINGS,
+ .policy = ethtool_policy,
+ .doit = ethtool_set_settings,
+ },
+ {
+ .cmd = ETHTOOL_CMD_GET_PAUSE,
+ .policy = ethtool_policy,
+ .doit = ethtool_get_pause,
+ },
+ {
+ .cmd = ETHTOOL_CMD_SET_PAUSE,
+ .policy = ethtool_policy,
+ .doit = ethtool_set_pause,
+ },
+ {
+ .cmd = ETHTOOL_CMD_GET_MODULE_INFO,
+ .policy = ethtool_policy,
+ .doit = ethtool_get_module_info,
+ },
+ {
+ .cmd = ETHTOOL_CMD_SET_MODULE_INFO,
+ .policy = ethtool_policy,
+ .doit = ethtool_set_module_info,
+ },
+};
+
+int ethtool_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ethtool_get_settings);
+
+int ethtool_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ethtool_set_settings);
+
+void ethtool_get_pauseparam(struct net_device *dev,
+ struct ethtool_pauseparam *pause)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ethtool_get_pauseparam);
+
+int ethtool_set_pauseparam(struct net_device *dev,
+ struct ethtool_pauseparam *pause)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ethtool_set_pauseparam);
+
+void ethtool_get_ethtool_stats(struct net_device *dev,
+ struct ethtool_stats *stats,
+ u64 *data)
+{
+
+ /* example the driver handler would do the below
+ *
+ err = nla_put_u32(msg, PORT_ATTR_IFINDEX, ifindex);
+ if (err < 0)
+ goto err_out;
+
+ err = nla_put_u32(msg, PORT_ATTR_FLAGS, flags);
+ if (err < 0)
+ goto err_out;
+
+ err = nla_put_u32(msg, PORT_ATTR_SSET_COUNT,
+ count);
+ if (err < 0)
+ goto err_out;
+
+ nest = nla_nest_start(msg, PORT_ATTR_STATS);
+ for (i = 0; i < count; i++)
+ nla_put_u64(msg, PORT_ATTR_STAT, data[i]);
+ nla_nest_end(msg, nest);
+
+ */
+}
+EXPORT_SYMBOL_GPL(ethtool_get_ethtool_stats);
+
+void ethtool_get_strings(struct net_device *dev, u32 stringset, u8 *data)
+{
+ return;
+}
+EXPORT_SYMBOL_GPL(ethtool_get_strings);
+
+int ethtool_get_sset_count(struct net_device *dev, int sset)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ethtool_get_sset_count);
+
+int ethtool_set_phys_id(struct net_device *dev, enum ethtool_phys_id_state state)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ethtool_set_phys_id);
+
+int ethtool_get_module_info(struct net_device *dev, struct ethtool_modinfo *info)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ethtool_get_module_info);
+
+int ethtool_get_module_eeprom(struct net_device *dev,
+ struct ethtool_eeprom *eeprom, u8 *data)
+{
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ethtool_get_module_eeprom);
+
+static int __init ethtool_init(void)
+{
+ int err;
+
+ err = genl_register_family_with_ops_groups(ðtool_family, ethtool_ops,
+ ethtool_mcgrp);
+ if (err) {
+ genl_unregister_family(&port_family);
+ return err;
+ }
+ pr_debug("ethtool netlink family register OK\n");
+
+ return 0;
+}
+late_initcall(ethtool_init);
--
1.9.1
^ permalink raw reply related
* Re: [PATCH] net: mark DECnet as broken
From: David Miller @ 2016-04-11 3:02 UTC (permalink / raw)
To: vegard.nossum; +Cc: netdev, linux-kernel, eric.dumazet, sasha.levin
In-Reply-To: <1460013763-22985-1-git-send-email-vegard.nossum@oracle.com>
From: Vegard Nossum <vegard.nossum@oracle.com>
Date: Thu, 7 Apr 2016 09:22:43 +0200
> There are NULL pointer dereference bugs in DECnet which can be triggered
> by unprivileged users and have been reported multiple times to LKML,
> however nobody seems confident enough in the proposed fixes to merge them
> and the consensus seems to be that nobody cares enough about DECnet to
> see it fixed anyway.
>
> To shield unsuspecting users from the possible DOS, we should mark this
> BROKEN until somebody who actually uses this code can fix it.
>
> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
> Link: https://lkml.org/lkml/2015/12/17/666
As stated, I'm not applying this, and rather I am fixing this as
below:
====================
[PATCH] decnet: Do not build routes to devices without decnet private data.
In particular, make sure we check for decnet private presence
for loopback devices.
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/decnet/dn_route.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 607a14f..b1dc096 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1034,10 +1034,13 @@ source_ok:
if (!fld.daddr) {
fld.daddr = fld.saddr;
- err = -EADDRNOTAVAIL;
if (dev_out)
dev_put(dev_out);
+ err = -EINVAL;
dev_out = init_net.loopback_dev;
+ if (!dev_out->dn_ptr)
+ goto out;
+ err = -EADDRNOTAVAIL;
dev_hold(dev_out);
if (!fld.daddr) {
fld.daddr =
@@ -1110,6 +1113,8 @@ source_ok:
if (dev_out == NULL)
goto out;
dn_db = rcu_dereference_raw(dev_out->dn_ptr);
+ if (!dn_db)
+ goto e_inval;
/* Possible improvement - check all devices for local addr */
if (dn_dev_islocal(dev_out, fld.daddr)) {
dev_put(dev_out);
@@ -1151,6 +1156,8 @@ select_source:
dev_put(dev_out);
dev_out = init_net.loopback_dev;
dev_hold(dev_out);
+ if (!dev_out->dn_ptr)
+ goto e_inval;
fld.flowidn_oif = dev_out->ifindex;
if (res.fi)
dn_fib_info_put(res.fi);
--
2.1.0
^ permalink raw reply related
* RE: [v7, 0/5] Fix eSDHC host version register bug
From: Yangbo Lu @ 2016-04-11 2:54 UTC (permalink / raw)
To: Scott Wood, Yang-Leo Li
Cc: devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ulf Hansson,
Zhao Qiang, Russell King, Bhupesh Sharma,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Santosh Shilimkar,
linux-mmc, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Jochen Friedrich, Xiaobo Xie,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Rob Herring, linux-i2c-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Claudiu Manoil, Kumar Gala,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org,
linux-clk-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
"linux-arm-kernel-IAPFreCvJWNGWvitb5QawA@public.gmane.org
In-Reply-To: <CAPDyKFp8GDzSkpbTKoRL=rwiBrsWmsgN95zB-QJwfZTnq4M8BA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Hi Leo and Scott,
> -----Original Message-----
> From: Ulf Hansson [mailto:ulf.hansson@linaro.org]
> Sent: Wednesday, April 06, 2016 4:15 PM
> To: Yangbo Lu; Scott Wood
> Cc: devicetree@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
> clk@vger.kernel.org; linux-i2c@vger.kernel.org; iommu@lists.linux-
> foundation.org; netdev@vger.kernel.org; linux-mmc; Rob Herring; Russell
> King; Jochen Friedrich; Joerg Roedel; Claudiu Manoil; Bhupesh Sharma;
> Zhao Qiang; Kumar Gala; Santosh Shilimkar; Yang-Leo Li; Xiaobo Xie
> Subject: Re: [v7, 0/5] Fix eSDHC host version register bug
>
> >>
> >> I was about to queue this for next, when I noticed that checkpatch is
> >> complaining/warning lots about these patches. Can you please a round
> >> of checkpatch and fix what makes sense.
> >>
> >> Kind regards
> >> Uffe
> >
> > [Lu Yangbo-B47093] Sorry about this, Uffe...
>
> No worries!
>
> > Should I ignore the warnings that update MAINTAINERS?
>
> drivers/soc/fsl/guts.c isn't part of the MAINTAINERS file, it should be.
>
> I also realize that the FREESCALE QUICC ENGINE LIBRARY section
> drivers/soc/fsl/qe/* also need an active maintainer, as it's currently
> orphan.
>
> Perhaps we should have create a new section for drivers/soc/fsl/* instead
> that covers all of the above? Maybe you or Scott are interested to pick
> it up?
>
> I also noted that, "include/linux/fsl/" isn't present in MAINTAINERS,
> please add that as well.
[Lu Yangbo-B47093] Could give some advice on the MAINTAINERS for these 'fsl' files
since I really don’t know who should be the right person?
I will appreciate that!
Thanks a lot.
>
> > Regarding the 'undocumented' warning, I will added a patch updates doc
> before all the patches, Ok?
>
> Yes, good!
>
> >
> > Thanks a lot :)
> >
>
> Kind regards
> Uffe
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply
* Re: [PATCH net-next] bpf: simplify verifier register state assignments
From: David Miller @ 2016-04-11 2:43 UTC (permalink / raw)
To: ast; +Cc: daniel, netdev
In-Reply-To: <1459996761-2926623-1-git-send-email-ast@fb.com>
From: Alexei Starovoitov <ast@fb.com>
Date: Wed, 6 Apr 2016 19:39:21 -0700
> verifier is using the following structure to track the state of registers:
> struct reg_state {
> enum bpf_reg_type type;
> union {
> int imm;
> struct bpf_map *map_ptr;
> };
> };
> and later on in states_equal() does memcmp(&old->regs[i], &cur->regs[i],..)
> to find equivalent states.
> Throughout the code of verifier there are assignements to 'imm' and 'map_ptr'
> fields and it's not obvious that most of the assignments into 'imm' don't
> need to clear extra 4 bytes (like mark_reg_unknown_value() does) to make sure
> that memcmp doesn't go over junk left from 'map_ptr' assignment.
>
> Simplify the code by converting 'int' into 'long'
>
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Applied, thanks Daniel.
^ permalink raw reply
* Re: [PATCH RFT 2/2] macb: kill PHY reset code
From: Andrew Lunn @ 2016-04-11 2:28 UTC (permalink / raw)
To: Sergei Shtylyov; +Cc: nicolas.ferre, netdev, linux-kernel
In-Reply-To: <2811962.eGX2i5RJbZ@wasted.cogentembedded.com>
On Sat, Apr 09, 2016 at 01:25:03AM +0300, Sergei Shtylyov wrote:
> With the 'phylib' now being aware of the "reset-gpios" PHY node property,
> there should be no need to frob the PHY reset in this driver anymore...
>
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
>
> ---
> drivers/net/ethernet/cadence/macb.c | 17 -----------------
> drivers/net/ethernet/cadence/macb.h | 1 -
> 2 files changed, 18 deletions(-)
>
> Index: net-next/drivers/net/ethernet/cadence/macb.c
> ===================================================================
> --- net-next.orig/drivers/net/ethernet/cadence/macb.c
> +++ net-next/drivers/net/ethernet/cadence/macb.c
> @@ -2884,7 +2884,6 @@ static int macb_probe(struct platform_de
> = macb_clk_init;
> int (*init)(struct platform_device *) = macb_init;
> struct device_node *np = pdev->dev.of_node;
> - struct device_node *phy_node;
> const struct macb_config *macb_config = NULL;
> struct clk *pclk, *hclk = NULL, *tx_clk = NULL;
> unsigned int queue_mask, num_queues;
> @@ -2977,18 +2976,6 @@ static int macb_probe(struct platform_de
> else
> macb_get_hwaddr(bp);
>
> - /* Power up the PHY if there is a GPIO reset */
> - phy_node = of_get_next_available_child(np, NULL);
> - if (phy_node) {
> - int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
> -
> - if (gpio_is_valid(gpio)) {
> - bp->reset_gpio = gpio_to_desc(gpio);
> - gpiod_direction_output(bp->reset_gpio, 1);
Hi Sergei
The code you are deleting would of ignored the flags in the gpio
property, i.e. active low. The new code in the previous patch does
however take the flags into account. Did you check if there are any
device trees which have flags, which were never used, but are now
going to be used and thus break...
Andrew
^ permalink raw reply
* Re: [PATCH 1/3] bonding: do not allow rlb updates to invalid mac
From: David Miller @ 2016-04-11 2:26 UTC (permalink / raw)
To: dbanerje; +Cc: j.vosburgh, vfalico, gospo, netdev, linux-kernel
In-Reply-To: <1459974275-24944-1-git-send-email-dbanerje@akamai.com>
Please resubmit this patch series with a proper cover letter.
It should have "[PATCH 0/3] ..." as the subject line and explain
at a high level what your patch series is doing, how it is doing
it, and why it is doing it that way.
You must also be explicit about which of my trees your changes
are targetting.
Thanks.
^ permalink raw reply
* Re: [PATCH v2] sctp: avoid refreshing heartbeat timer too often
From: David Miller @ 2016-04-11 2:23 UTC (permalink / raw)
To: marcelo.leitner; +Cc: netdev, nhorman, vyasevich, David.Laight, linux-sctp
In-Reply-To: <7aa05d5d8b9100fa48798b43887de6d14d577eac.1459966346.git.marcelo.leitner@gmail.com>
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Wed, 6 Apr 2016 15:15:19 -0300
> Currently on high rate SCTP streams the heartbeat timer refresh can
> consume quite a lot of resources as timer updates are costly and it
> contains a random factor, which a) is also costly and b) invalidates
> mod_timer() optimization for not editing a timer to the same value.
> It may even cause the timer to be slightly advanced, for no good reason.
>
> As suggested by David Laight this patch now removes this timer update
> from hot path by leaving the timer on and re-evaluating upon its
> expiration if the heartbeat is still needed or not, similarly to what is
> done for TCP. If it's not needed anymore the timer is re-scheduled to
> the new timeout, considering the time already elapsed.
>
> For this, we now record the last tx timestamp per transport, updated in
> the same spots as hb timer was restarted on tx. Also split up
> sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
> sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the
> heartbeat one.
>
> On loopback with MTU of 65535 and data chunks with 1636, so that we
> have a considerable amount of chunks without stressing system calls,
> netperf -t SCTP_STREAM -l 30, perf looked like this before:
. ..
> And after this patch, now with netperf -l 60:
...
> Throughput-wise, from 6800mbps without the patch to 7050mbps with it,
> ~3.7%.
>
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Applied, thanks Marcelo.
^ permalink raw reply
* [net-next PATCH v2 5/5] Documentation: Add documentation for TSO and GSO features
From: Alexander Duyck @ 2016-04-11 1:45 UTC (permalink / raw)
To: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem
In-Reply-To: <20160411013855.11189.56567.stgit@ahduyck-xeon-server>
This document is a starting point for defining the TSO and GSO features.
The whole thing is starting to get a bit messy so I wanted to make sure we
have notes somwhere to start describing what does and doesn't work.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
Documentation/networking/segmentation-offloads.txt | 130 ++++++++++++++++++++
1 file changed, 130 insertions(+)
create mode 100644 Documentation/networking/segmentation-offloads.txt
diff --git a/Documentation/networking/segmentation-offloads.txt b/Documentation/networking/segmentation-offloads.txt
new file mode 100644
index 000000000000..f200467ade38
--- /dev/null
+++ b/Documentation/networking/segmentation-offloads.txt
@@ -0,0 +1,130 @@
+Segmentation Offloads in the Linux Networking Stack
+
+Introduction
+============
+
+This document describes a set of techniques in the Linux networking stack
+to take advantage of segmentation offload capabilities of various NICs.
+
+The following technologies are described:
+ * TCP Segmentation Offload - TSO
+ * UDP Fragmentation Offload - UFO
+ * IPIP, SIT, GRE, and UDP Tunnel Offloads
+ * Generic Segmentation Offload - GSO
+ * Generic Receive Offload - GRO
+ * Partial Generic Segmentation Offload - GSO_PARTIAL
+
+TCP Segmentation Offload
+========================
+
+TCP segmentation allows a device to segment a single frame into multiple
+frames with a data payload size specified in skb_shinfo()->gso_size.
+When TCP segmentation requested the bit for either SKB_GSO_TCP or
+SKB_GSO_TCP6 should be set in skb_shinfo()->gso_type and
+skb_shinfo()->gso_size should be set to a non-zero value.
+
+TCP segmentation is dependent on support for the use of partial checksum
+offload. For this reason TSO is normally disabled if the Tx checksum
+offload for a given device is disabled.
+
+In order to support TCP segmentation offload it is necessary to populate
+the network and transport header offsets of the skbuff so that the device
+drivers will be able determine the offsets of the IP or IPv6 header and the
+TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should
+also point to the TCP header of the packet.
+
+For IPv4 segmentation we support one of two types in terms of the IP ID.
+The default behavior is to increment the IP ID with every segment. If the
+GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP
+ID and all segments will use the same IP ID. If a device has
+NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO
+and we will either increment the IP ID for all frames, or leave it at a
+static value based on driver preference.
+
+UDP Fragmentation Offload
+=========================
+
+UDP fragmentation offload allows a device to fragment an oversized UDP
+datagram into multiple IPv4 fragments. Many of the requirements for UDP
+fragmentation offload are the same as TSO. However the IPv4 ID for
+fragments should not increment as a single IPv4 datagram is fragmented.
+
+IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads
+========================================================
+
+In addition to the offloads described above it is possible for a frame to
+contain additional headers such as an outer tunnel. In order to account
+for such instances an additional set of segmentation offload types were
+introduced including SKB_GSO_IPIP, SKB_GSO_SIT, SKB_GSO_GRE, and
+SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify
+cases where there are more than just 1 set of headers. For example in the
+case of IPIP and SIT we should have the network and transport headers moved
+from the standard list of headers to "inner" header offsets.
+
+Currently only two levels of headers are supported. The convention is to
+refer to the tunnel headers as the outer headers, while the encapsulated
+data is normally referred to as the inner headers. Below is the list of
+calls to access the given headers:
+
+IPIP/SIT Tunnel:
+ Outer Inner
+MAC skb_mac_header
+Network skb_network_header skb_inner_network_header
+Transport skb_transport_header
+
+UDP/GRE Tunnel:
+ Outer Inner
+MAC skb_mac_header skb_inner_mac_header
+Network skb_network_header skb_inner_network_header
+Transport skb_transport_header skb_inner_transport_header
+
+In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and
+SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the
+fact that the outer header also requests to have a non-zero checksum
+included in the outer header.
+
+Finally there is SKB_GSO_REMCSUM which indicates that a given tunnel header
+has requested a remote checksum offload. In this case the inner headers
+will be left with a partial checksum and only the outer header checksum
+will be computed.
+
+Generic Segmentation Offload
+============================
+
+Generic segmentation offload is a pure software offload that is meant to
+deal with cases where device drivers cannot perform the offloads described
+above. What occurs in GSO is that a given skbuff will have its data broken
+out over multiple skbuffs that have been resized to match the MSS provided
+via skb_shinfo()->gso_size.
+
+Before enabling any hardware segmentation offload a corresponding software
+offload is required in GSO. Otherwise it becomes possible for a frame to
+be re-routed between devices and end up being unable to be transmitted.
+
+Generic Receive Offload
+=======================
+
+Generic receive offload is the complement to GSO. Ideally any frame
+assembled by GRO should be segmented to create an identical sequence of
+frames using GSO, and any sequence of frames segmented by GSO should be
+able to be reassembled back to the original by GRO. The only exception to
+this is IPv4 ID in the case that the DF bit is set for a given IP header.
+If the value of the IPv4 ID is not sequentially incrementing it will be
+altered so that it is when a frame assembled via GRO is segmented via GSO.
+
+Partial Generic Segmentation Offload
+====================================
+
+Partial generic segmentation offload is a hybrid between TSO and GSO. What
+it effectively does is take advantage of certain traits of TCP and tunnels
+so that instead of having to rewrite the packet headers for each segment
+only the inner-most transport header and possibly the outer-most network
+header need to be updated. This allows devices that do not support tunnel
+offloads or tunnel offloads with checksum to still make use of segmentation.
+
+With the partial offload what occurs is that all headers excluding the
+inner transport header are updated such that they will contain the correct
+values for if the header was simply duplicated. The one exception to this
+is the outer IPv4 ID field. It is up to the device drivers to guarantee
+that the IPv4 ID field is incremented in the case that a given header does
+not have the DF bit set.
^ permalink raw reply related
* [net-next PATCH v2 4/5] GSO: Support partial segmentation offload
From: Alexander Duyck @ 2016-04-11 1:45 UTC (permalink / raw)
To: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem
In-Reply-To: <20160411013855.11189.56567.stgit@ahduyck-xeon-server>
This patch adds support for something I am referring to as GSO partial.
The basic idea is that we can support a broader range of devices for
segmentation if we use fixed outer headers and have the hardware only
really deal with segmenting the inner header. The idea behind the naming
is due to the fact that everything before csum_start will be fixed headers,
and everything after will be the region that is handled by hardware.
With the current implementation it allows us to add support for the
following GSO types with an inner TSO_MANGLEID or TSO6 offload:
NETIF_F_GSO_GRE
NETIF_F_GSO_GRE_CSUM
NETIF_F_GSO_IPIP
NETIF_F_GSO_SIT
NETIF_F_UDP_TUNNEL
NETIF_F_UDP_TUNNEL_CSUM
In the case of hardware that already supports tunneling we may be able to
extend this further to support TSO_TCPV4 without TSO_MANGLEID if the
hardware can support updating inner IPv4 headers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
include/linux/netdev_features.h | 5 +++++
include/linux/netdevice.h | 2 ++
include/linux/skbuff.h | 9 +++++++--
net/core/dev.c | 36 +++++++++++++++++++++++++++++++++---
net/core/ethtool.c | 1 +
net/core/skbuff.c | 29 ++++++++++++++++++++++++++++-
net/ipv4/af_inet.c | 20 ++++++++++++++++----
net/ipv4/gre_offload.c | 26 +++++++++++++++++++++-----
net/ipv4/tcp_offload.c | 10 ++++++++--
net/ipv4/udp_offload.c | 27 +++++++++++++++++++++------
net/ipv6/ip6_offload.c | 10 +++++++++-
11 files changed, 151 insertions(+), 24 deletions(-)
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 7cf272a4b5c8..9fc79df0e561 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -48,6 +48,10 @@ enum {
NETIF_F_GSO_SIT_BIT, /* ... SIT tunnel with TSO */
NETIF_F_GSO_UDP_TUNNEL_BIT, /* ... UDP TUNNEL with TSO */
NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT,/* ... UDP TUNNEL with TSO & CSUM */
+ NETIF_F_GSO_PARTIAL_BIT, /* ... Only segment inner-most L4
+ * in hardware and all other
+ * headers in software.
+ */
NETIF_F_GSO_TUNNEL_REMCSUM_BIT, /* ... TUNNEL with TSO & REMCSUM */
/**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */
NETIF_F_GSO_TUNNEL_REMCSUM_BIT,
@@ -122,6 +126,7 @@ enum {
#define NETIF_F_GSO_UDP_TUNNEL __NETIF_F(GSO_UDP_TUNNEL)
#define NETIF_F_GSO_UDP_TUNNEL_CSUM __NETIF_F(GSO_UDP_TUNNEL_CSUM)
#define NETIF_F_TSO_MANGLEID __NETIF_F(TSO_MANGLEID)
+#define NETIF_F_GSO_PARTIAL __NETIF_F(GSO_PARTIAL)
#define NETIF_F_GSO_TUNNEL_REMCSUM __NETIF_F(GSO_TUNNEL_REMCSUM)
#define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER)
#define NETIF_F_HW_VLAN_STAG_RX __NETIF_F(HW_VLAN_STAG_RX)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6a248a3a44bf..e15fbcd79be6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1654,6 +1654,7 @@ struct net_device {
netdev_features_t vlan_features;
netdev_features_t hw_enc_features;
netdev_features_t mpls_features;
+ netdev_features_t gso_partial_features;
int ifindex;
int group;
@@ -4004,6 +4005,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
BUILD_BUG_ON(SKB_GSO_SIT != (NETIF_F_GSO_SIT >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL != (NETIF_F_GSO_UDP_TUNNEL >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL_CSUM != (NETIF_F_GSO_UDP_TUNNEL_CSUM >> NETIF_F_GSO_SHIFT));
+ BUILD_BUG_ON(SKB_GSO_PARTIAL != (NETIF_F_GSO_PARTIAL >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_TUNNEL_REMCSUM != (NETIF_F_GSO_TUNNEL_REMCSUM >> NETIF_F_GSO_SHIFT));
return (features & feature) == feature;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 5fba16658f9d..da0ace389fec 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -483,7 +483,9 @@ enum {
SKB_GSO_UDP_TUNNEL_CSUM = 1 << 12,
- SKB_GSO_TUNNEL_REMCSUM = 1 << 13,
+ SKB_GSO_PARTIAL = 1 << 13,
+
+ SKB_GSO_TUNNEL_REMCSUM = 1 << 14,
};
#if BITS_PER_LONG > 32
@@ -3591,7 +3593,10 @@ static inline struct sec_path *skb_sec_path(struct sk_buff *skb)
* Keeps track of level of encapsulation of network headers.
*/
struct skb_gso_cb {
- int mac_offset;
+ union {
+ int mac_offset;
+ int data_offset;
+ };
int encap_level;
__wsum csum;
__u16 csum_start;
diff --git a/net/core/dev.c b/net/core/dev.c
index b78b586b1856..556dd09af3b8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2711,6 +2711,19 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
return ERR_PTR(err);
}
+ /* Only report GSO partial support if it will enable us to
+ * support segmentation on this frame without needing additional
+ * work.
+ */
+ if (features & NETIF_F_GSO_PARTIAL) {
+ netdev_features_t partial_features = NETIF_F_GSO_ROBUST;
+ struct net_device *dev = skb->dev;
+
+ partial_features |= dev->features & dev->gso_partial_features;
+ if (!skb_gso_ok(skb, features | partial_features))
+ features &= ~NETIF_F_GSO_PARTIAL;
+ }
+
BUILD_BUG_ON(SKB_SGO_CB_OFFSET +
sizeof(*SKB_GSO_CB(skb)) > sizeof(skb->cb));
@@ -2834,8 +2847,17 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
if (gso_segs > dev->gso_max_segs)
return features & ~NETIF_F_GSO_MASK;
- /* Make sure to clear the IPv4 ID mangling feature if
- * the IPv4 header has the potential to be fragmented.
+ /* Support for GSO partial features requires software
+ * intervention before we can actually process the packets
+ * so we need to strip support for any partial features now
+ * and we can pull them back in after we have partially
+ * segmented the frame.
+ */
+ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL))
+ features &= ~dev->gso_partial_features;
+
+ /* Make sure to clear the IPv4 ID mangling feature if the
+ * IPv4 header has the potential to be fragmented.
*/
if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
struct iphdr *iph = skb->encapsulation ?
@@ -6729,6 +6751,14 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
}
}
+ /* GSO partial features require GSO partial be set */
+ if ((features & dev->gso_partial_features) &&
+ !(features & NETIF_F_GSO_PARTIAL)) {
+ netdev_dbg(dev,
+ "Dropping partially supported GSO features since no GSO partial.\n");
+ features &= ~dev->gso_partial_features;
+ }
+
#ifdef CONFIG_NET_RX_BUSY_POLL
if (dev->netdev_ops->ndo_busy_poll)
features |= NETIF_F_BUSY_POLL;
@@ -7011,7 +7041,7 @@ int register_netdevice(struct net_device *dev)
/* Make NETIF_F_SG inheritable to tunnel devices.
*/
- dev->hw_enc_features |= NETIF_F_SG;
+ dev->hw_enc_features |= NETIF_F_SG | NETIF_F_GSO_PARTIAL;
/* Make NETIF_F_SG inheritable to MPLS.
*/
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 9494c41cc77c..e0cf20a3b3dd 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -88,6 +88,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
[NETIF_F_GSO_SIT_BIT] = "tx-sit-segmentation",
[NETIF_F_GSO_UDP_TUNNEL_BIT] = "tx-udp_tnl-segmentation",
[NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT] = "tx-udp_tnl-csum-segmentation",
+ [NETIF_F_GSO_PARTIAL_BIT] = "tx-gso-partial",
[NETIF_F_FCOE_CRC_BIT] = "tx-checksum-fcoe-crc",
[NETIF_F_SCTP_CRC_BIT] = "tx-checksum-sctp",
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d04c2d1c8c87..4cc594cdaada 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3076,8 +3076,9 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
struct sk_buff *frag_skb = head_skb;
unsigned int offset = doffset;
unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
+ unsigned int partial_segs = 0;
unsigned int headroom;
- unsigned int len;
+ unsigned int len = head_skb->len;
__be16 proto;
bool csum;
int sg = !!(features & NETIF_F_SG);
@@ -3094,6 +3095,15 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
csum = !!can_checksum_protocol(features, proto);
+ /* GSO partial only requires that we trim off any excess that
+ * doesn't fit into an MSS sized block, so take care of that
+ * now.
+ */
+ if (features & NETIF_F_GSO_PARTIAL) {
+ partial_segs = len / mss;
+ mss *= partial_segs;
+ }
+
headroom = skb_headroom(head_skb);
pos = skb_headlen(head_skb);
@@ -3281,6 +3291,23 @@ perform_csum_check:
*/
segs->prev = tail;
+ /* Update GSO info on first skb in partial sequence. */
+ if (partial_segs) {
+ int type = skb_shinfo(head_skb)->gso_type;
+
+ /* Update type to add partial and then remove dodgy if set */
+ type |= SKB_GSO_PARTIAL;
+ type &= ~SKB_GSO_DODGY;
+
+ /* Update GSO info and prepare to start updating headers on
+ * our way back down the stack of protocols.
+ */
+ skb_shinfo(segs)->gso_size = skb_shinfo(head_skb)->gso_size;
+ skb_shinfo(segs)->gso_segs = partial_segs;
+ skb_shinfo(segs)->gso_type = type;
+ SKB_GSO_CB(segs)->data_offset = skb_headroom(segs) + doffset;
+ }
+
/* Following permits correct backpressure, for protocols
* using skb_set_owner_w().
* Idea is to tranfert ownership from head_skb to last segment.
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8564cab96189..2e6e65fc4d20 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1200,7 +1200,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
const struct net_offload *ops;
unsigned int offset = 0;
struct iphdr *iph;
- int proto;
+ int proto, tot_len;
int nhoff;
int ihl;
int id;
@@ -1219,6 +1219,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
SKB_GSO_UDP_TUNNEL_CSUM |
SKB_GSO_TCP_FIXEDID |
SKB_GSO_TUNNEL_REMCSUM |
+ SKB_GSO_PARTIAL |
0)))
goto out;
@@ -1273,10 +1274,21 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
if (skb->next)
iph->frag_off |= htons(IP_MF);
offset += skb->len - nhoff - ihl;
- } else if (!fixedid) {
- iph->id = htons(id++);
+ tot_len = skb->len - nhoff;
+ } else if (skb_is_gso(skb)) {
+ if (!fixedid) {
+ iph->id = htons(id);
+ id += skb_shinfo(skb)->gso_segs;
+ }
+ tot_len = skb_shinfo(skb)->gso_size +
+ SKB_GSO_CB(skb)->data_offset +
+ skb->head - (unsigned char *)iph;
+ } else {
+ if (!fixedid)
+ iph->id = htons(id++);
+ tot_len = skb->len - nhoff;
}
- iph->tot_len = htons(skb->len - nhoff);
+ iph->tot_len = htons(tot_len);
ip_send_check(iph);
if (encap)
skb_reset_inner_headers(skb);
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 6376b0cdf693..20557f211408 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -36,7 +36,8 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
SKB_GSO_GRE |
SKB_GSO_GRE_CSUM |
SKB_GSO_IPIP |
- SKB_GSO_SIT)))
+ SKB_GSO_SIT |
+ SKB_GSO_PARTIAL)))
goto out;
if (!skb->encapsulation)
@@ -87,7 +88,7 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
skb = segs;
do {
struct gre_base_hdr *greh;
- __be32 *pcsum;
+ __sum16 *pcsum;
/* Set up inner headers if we are offloading inner checksum */
if (skb->ip_summed == CHECKSUM_PARTIAL) {
@@ -107,10 +108,25 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
continue;
greh = (struct gre_base_hdr *)skb_transport_header(skb);
- pcsum = (__be32 *)(greh + 1);
+ pcsum = (__sum16 *)(greh + 1);
+
+ if (skb_is_gso(skb)) {
+ unsigned int partial_adj;
+
+ /* Adjust checksum to account for the fact that
+ * the partial checksum is based on actual size
+ * whereas headers should be based on MSS size.
+ */
+ partial_adj = skb->len + skb_headroom(skb) -
+ SKB_GSO_CB(skb)->data_offset -
+ skb_shinfo(skb)->gso_size;
+ *pcsum = ~csum_fold((__force __wsum)htonl(partial_adj));
+ } else {
+ *pcsum = 0;
+ }
- *pcsum = 0;
- *(__sum16 *)pcsum = gso_make_checksum(skb, 0);
+ *(pcsum + 1) = 0;
+ *pcsum = gso_make_checksum(skb, 0);
} while ((skb = skb->next));
out:
return segs;
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index d1ffd55289bd..02737b607aa7 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -109,6 +109,12 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
goto out;
}
+ /* GSO partial only requires splitting the frame into an MSS
+ * multiple and possibly a remainder. So update the mss now.
+ */
+ if (features & NETIF_F_GSO_PARTIAL)
+ mss = skb->len - (skb->len % mss);
+
copy_destructor = gso_skb->destructor == tcp_wfree;
ooo_okay = gso_skb->ooo_okay;
/* All segments but the first should have ooo_okay cleared */
@@ -133,7 +139,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
newcheck = ~csum_fold((__force __wsum)((__force u32)th->check +
(__force u32)delta));
- do {
+ while (skb->next) {
th->fin = th->psh = 0;
th->check = newcheck;
@@ -153,7 +159,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
th->seq = htonl(seq);
th->cwr = 0;
- } while (skb->next);
+ }
/* Following permits TCP Small Queues to work well with GSO :
* The callback to TCP stack will be called at the time last frag
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 6230cf4b0d2d..097060def7f0 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -39,8 +39,11 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
* 16 bit length field due to the header being added outside of an
* IP or IPv6 frame that was already limited to 64K - 1.
*/
- partial = csum_sub(csum_unfold(uh->check),
- (__force __wsum)htonl(skb->len));
+ if (skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL)
+ partial = (__force __wsum)uh->len;
+ else
+ partial = (__force __wsum)htonl(skb->len);
+ partial = csum_sub(csum_unfold(uh->check), partial);
/* setup inner skb. */
skb->encapsulation = 0;
@@ -89,7 +92,7 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
udp_offset = outer_hlen - tnl_hlen;
skb = segs;
do {
- __be16 len;
+ unsigned int len;
if (remcsum)
skb->ip_summed = CHECKSUM_NONE;
@@ -107,14 +110,26 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
skb_reset_mac_header(skb);
skb_set_network_header(skb, mac_len);
skb_set_transport_header(skb, udp_offset);
- len = htons(skb->len - udp_offset);
+ len = skb->len - udp_offset;
uh = udp_hdr(skb);
- uh->len = len;
+
+ /* If we are only performing partial GSO the inner header
+ * will be using a length value equal to only one MSS sized
+ * segment instead of the entire frame.
+ */
+ if (skb_is_gso(skb)) {
+ uh->len = htons(skb_shinfo(skb)->gso_size +
+ SKB_GSO_CB(skb)->data_offset +
+ skb->head - (unsigned char *)uh);
+ } else {
+ uh->len = htons(len);
+ }
if (!need_csum)
continue;
- uh->check = ~csum_fold(csum_add(partial, (__force __wsum)len));
+ uh->check = ~csum_fold(csum_add(partial,
+ (__force __wsum)htonl(len)));
if (skb->encapsulation || !offload_csum) {
uh->check = gso_make_checksum(skb, ~uh->check);
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 061adcda65f3..f5eb184e1093 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -63,6 +63,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
int proto;
struct frag_hdr *fptr;
unsigned int unfrag_ip6hlen;
+ unsigned int payload_len;
u8 *prevhdr;
int offset = 0;
bool encap, udpfrag;
@@ -82,6 +83,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
SKB_GSO_UDP_TUNNEL |
SKB_GSO_UDP_TUNNEL_CSUM |
SKB_GSO_TUNNEL_REMCSUM |
+ SKB_GSO_PARTIAL |
0)))
goto out;
@@ -118,7 +120,13 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
for (skb = segs; skb; skb = skb->next) {
ipv6h = (struct ipv6hdr *)(skb_mac_header(skb) + nhoff);
- ipv6h->payload_len = htons(skb->len - nhoff - sizeof(*ipv6h));
+ if (skb_is_gso(skb))
+ payload_len = skb_shinfo(skb)->gso_size +
+ SKB_GSO_CB(skb)->data_offset +
+ skb->head - (unsigned char *)(ipv6h + 1);
+ else
+ payload_len = skb->len - nhoff - sizeof(*ipv6h);
+ ipv6h->payload_len = htons(payload_len);
skb->network_header = (u8 *)ipv6h - skb->head;
if (udpfrag) {
^ permalink raw reply related
* [net-next PATCH v2 3/5] GRO: Add support for TCP with fixed IPv4 ID field, limit tunnel IP ID values
From: Alexander Duyck @ 2016-04-11 1:44 UTC (permalink / raw)
To: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem
In-Reply-To: <20160411013855.11189.56567.stgit@ahduyck-xeon-server>
This patch does two things.
First it allows TCP to aggregate TCP frames with a fixed IPv4 ID field. As
a result we should now be able to aggregate flows that were converted from
IPv6 to IPv4. In addition this allows us more flexibility for future
implementations of segmentation as we may be able to use a fixed IP ID when
segmenting the flow.
The second thing this does is that it places limitations on the outer IPv4
ID header in the case of tunneled frames. Specifically it forces the IP ID
to be incrementing by 1 unless the DF bit is set in the outer IPv4 header.
This way we can avoid creating overlapping series of IP IDs that could
possibly be fragmented if the frame goes through GRO and is then
resegmented via GSO.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
include/linux/netdevice.h | 5 ++++-
net/core/dev.c | 1 +
net/ipv4/af_inet.c | 35 ++++++++++++++++++++++++++++-------
net/ipv4/tcp_offload.c | 16 +++++++++++++++-
net/ipv6/ip6_offload.c | 8 ++++++--
5 files changed, 54 insertions(+), 11 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index eb7f037a4068..6a248a3a44bf 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2121,7 +2121,10 @@ struct napi_gro_cb {
/* Used in GRE, set in fou/gue_gro_receive */
u8 is_fou:1;
- /* 6 bit hole */
+ /* Used to determine if flush_id can be ignored */
+ u8 is_atomic:1;
+
+ /* 5 bit hole */
/* used to support CHECKSUM_COMPLETE for tunneling protocols */
__wsum csum;
diff --git a/net/core/dev.c b/net/core/dev.c
index e896b1953ab6..b78b586b1856 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4462,6 +4462,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
NAPI_GRO_CB(skb)->free = 0;
NAPI_GRO_CB(skb)->encap_mark = 0;
NAPI_GRO_CB(skb)->is_fou = 0;
+ NAPI_GRO_CB(skb)->is_atomic = 1;
NAPI_GRO_CB(skb)->gro_remcsum_start = 0;
/* Setup for GRO checksum validation */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 5bbea9a0ce96..8564cab96189 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1328,6 +1328,7 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
for (p = *head; p; p = p->next) {
struct iphdr *iph2;
+ u16 flush_id;
if (!NAPI_GRO_CB(p)->same_flow)
continue;
@@ -1351,16 +1352,36 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
(iph->tos ^ iph2->tos) |
((iph->frag_off ^ iph2->frag_off) & htons(IP_DF));
- /* Save the IP ID check to be included later when we get to
- * the transport layer so only the inner most IP ID is checked.
- * This is because some GSO/TSO implementations do not
- * correctly increment the IP ID for the outer hdrs.
- */
- NAPI_GRO_CB(p)->flush_id =
- ((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
NAPI_GRO_CB(p)->flush |= flush;
+
+ /* We need to store of the IP ID check to be included later
+ * when we can verify that this packet does in fact belong
+ * to a given flow.
+ */
+ flush_id = (u16)(id - ntohs(iph2->id));
+
+ /* This bit of code makes it much easier for us to identify
+ * the cases where we are doing atomic vs non-atomic IP ID
+ * checks. Specifically an atomic check can return IP ID
+ * values 0 - 0xFFFF, while a non-atomic check can only
+ * return 0 or 0xFFFF.
+ */
+ if (!NAPI_GRO_CB(p)->is_atomic ||
+ !(iph->frag_off & htons(IP_DF))) {
+ flush_id ^= NAPI_GRO_CB(p)->count;
+ flush_id = flush_id ? 0xFFFF : 0;
+ }
+
+ /* If the previous IP ID value was based on an atomic
+ * datagram we can overwrite the value and ignore it.
+ */
+ if (NAPI_GRO_CB(skb)->is_atomic)
+ NAPI_GRO_CB(p)->flush_id = flush_id;
+ else
+ NAPI_GRO_CB(p)->flush_id |= flush_id;
}
+ NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF));
NAPI_GRO_CB(skb)->flush |= flush;
skb_set_network_header(skb, off);
/* The above will be needed by the transport layer if there is one
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 08dd25d835af..d1ffd55289bd 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -239,7 +239,7 @@ struct sk_buff **tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
found:
/* Include the IP ID check below from the inner most IP hdr */
- flush = NAPI_GRO_CB(p)->flush | NAPI_GRO_CB(p)->flush_id;
+ flush = NAPI_GRO_CB(p)->flush;
flush |= (__force int)(flags & TCP_FLAG_CWR);
flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
@@ -248,6 +248,17 @@ found:
flush |= *(u32 *)((u8 *)th + i) ^
*(u32 *)((u8 *)th2 + i);
+ /* When we receive our second frame we can made a decision on if we
+ * continue this flow as an atomic flow with a fixed ID or if we use
+ * an incrementing ID.
+ */
+ if (NAPI_GRO_CB(p)->flush_id != 1 ||
+ NAPI_GRO_CB(p)->count != 1 ||
+ !NAPI_GRO_CB(p)->is_atomic)
+ flush |= NAPI_GRO_CB(p)->flush_id;
+ else
+ NAPI_GRO_CB(p)->is_atomic = false;
+
mss = skb_shinfo(p)->gso_size;
flush |= (len - 1) >= mss;
@@ -316,6 +327,9 @@ static int tcp4_gro_complete(struct sk_buff *skb, int thoff)
iph->daddr, 0);
skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4;
+ if (NAPI_GRO_CB(skb)->is_atomic)
+ skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_FIXEDID;
+
return tcp_gro_complete(skb);
}
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index b3a779393d71..061adcda65f3 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -240,10 +240,14 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF00000));
NAPI_GRO_CB(p)->flush |= flush;
- /* Clear flush_id, there's really no concept of ID in IPv6. */
- NAPI_GRO_CB(p)->flush_id = 0;
+ /* If the previous IP ID value was based on an atomic
+ * datagram we can overwrite the value and ignore it.
+ */
+ if (NAPI_GRO_CB(skb)->is_atomic)
+ NAPI_GRO_CB(p)->flush_id = 0;
}
+ NAPI_GRO_CB(skb)->is_atomic = true;
NAPI_GRO_CB(skb)->flush |= flush;
skb_gro_postpull_rcsum(skb, iph, nlen);
^ permalink raw reply related
* [net-next PATCH v2 2/5] GSO: Add GSO type for fixed IPv4 ID
From: Alexander Duyck @ 2016-04-11 1:44 UTC (permalink / raw)
To: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem
In-Reply-To: <20160411013855.11189.56567.stgit@ahduyck-xeon-server>
This patch adds support for TSO using IPv4 headers with a fixed IP ID
field. This is meant to allow us to do a lossless GRO in the case of TCP
flows that use a fixed IP ID such as those that convert IPv6 header to IPv4
headers.
In addition I am adding a feature that for now I am referring to TSO with
IP ID mangling. Basically when this flag is enabled the device has the
option to either output the flow with incrementing IP IDs or with a fixed
IP ID regardless of what the original IP ID ordering was. This is useful
in cases where the DF bit is set and we do not care if the original IP ID
value is maintained.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
include/linux/netdev_features.h | 3 +++
include/linux/netdevice.h | 1 +
include/linux/skbuff.h | 20 +++++++++++---------
net/core/dev.c | 34 +++++++++++++++++++++++++++++-----
net/core/ethtool.c | 1 +
net/ipv4/af_inet.c | 19 +++++++++++--------
net/ipv4/gre_offload.c | 1 +
net/ipv4/tcp_offload.c | 4 +++-
net/ipv6/ip6_offload.c | 3 ++-
net/mpls/mpls_gso.c | 1 +
10 files changed, 63 insertions(+), 24 deletions(-)
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index a734bf43d190..7cf272a4b5c8 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -39,6 +39,7 @@ enum {
NETIF_F_UFO_BIT, /* ... UDPv4 fragmentation */
NETIF_F_GSO_ROBUST_BIT, /* ... ->SKB_GSO_DODGY */
NETIF_F_TSO_ECN_BIT, /* ... TCP ECN support */
+ NETIF_F_TSO_MANGLEID_BIT, /* ... IPV4 ID mangling allowed */
NETIF_F_TSO6_BIT, /* ... TCPv6 segmentation */
NETIF_F_FSO_BIT, /* ... FCoE segmentation */
NETIF_F_GSO_GRE_BIT, /* ... GRE with TSO */
@@ -120,6 +121,7 @@ enum {
#define NETIF_F_GSO_SIT __NETIF_F(GSO_SIT)
#define NETIF_F_GSO_UDP_TUNNEL __NETIF_F(GSO_UDP_TUNNEL)
#define NETIF_F_GSO_UDP_TUNNEL_CSUM __NETIF_F(GSO_UDP_TUNNEL_CSUM)
+#define NETIF_F_TSO_MANGLEID __NETIF_F(TSO_MANGLEID)
#define NETIF_F_GSO_TUNNEL_REMCSUM __NETIF_F(GSO_TUNNEL_REMCSUM)
#define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER)
#define NETIF_F_HW_VLAN_STAG_RX __NETIF_F(HW_VLAN_STAG_RX)
@@ -147,6 +149,7 @@ enum {
/* List of features with software fallbacks. */
#define NETIF_F_GSO_SOFTWARE (NETIF_F_TSO | NETIF_F_TSO_ECN | \
+ NETIF_F_TSO_MANGLEID | \
NETIF_F_TSO6 | NETIF_F_UFO)
/* List of IP checksum features. Note that NETIF_F_ HW_CSUM should not be
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 347ad5de0d93..eb7f037a4068 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3992,6 +3992,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_UFO >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_DODGY != (NETIF_F_GSO_ROBUST >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_TCP_ECN != (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
+ BUILD_BUG_ON(SKB_GSO_TCP_FIXEDID != (NETIF_F_TSO_MANGLEID >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_TCPV6 != (NETIF_F_TSO6 >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_FCOE != (NETIF_F_FSO >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_GRE != (NETIF_F_GSO_GRE >> NETIF_F_GSO_SHIFT));
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 007381270ff8..5fba16658f9d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -465,23 +465,25 @@ enum {
/* This indicates the tcp segment has CWR set. */
SKB_GSO_TCP_ECN = 1 << 3,
- SKB_GSO_TCPV6 = 1 << 4,
+ SKB_GSO_TCP_FIXEDID = 1 << 4,
- SKB_GSO_FCOE = 1 << 5,
+ SKB_GSO_TCPV6 = 1 << 5,
- SKB_GSO_GRE = 1 << 6,
+ SKB_GSO_FCOE = 1 << 6,
- SKB_GSO_GRE_CSUM = 1 << 7,
+ SKB_GSO_GRE = 1 << 7,
- SKB_GSO_IPIP = 1 << 8,
+ SKB_GSO_GRE_CSUM = 1 << 8,
- SKB_GSO_SIT = 1 << 9,
+ SKB_GSO_IPIP = 1 << 9,
- SKB_GSO_UDP_TUNNEL = 1 << 10,
+ SKB_GSO_SIT = 1 << 10,
- SKB_GSO_UDP_TUNNEL_CSUM = 1 << 11,
+ SKB_GSO_UDP_TUNNEL = 1 << 11,
- SKB_GSO_TUNNEL_REMCSUM = 1 << 12,
+ SKB_GSO_UDP_TUNNEL_CSUM = 1 << 12,
+
+ SKB_GSO_TUNNEL_REMCSUM = 1 << 13,
};
#if BITS_PER_LONG > 32
diff --git a/net/core/dev.c b/net/core/dev.c
index 09fb1ace9dc8..e896b1953ab6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2825,14 +2825,36 @@ static netdev_features_t dflt_features_check(const struct sk_buff *skb,
return vlan_features_check(skb, features);
}
+static netdev_features_t gso_features_check(const struct sk_buff *skb,
+ struct net_device *dev,
+ netdev_features_t features)
+{
+ u16 gso_segs = skb_shinfo(skb)->gso_segs;
+
+ if (gso_segs > dev->gso_max_segs)
+ return features & ~NETIF_F_GSO_MASK;
+
+ /* Make sure to clear the IPv4 ID mangling feature if
+ * the IPv4 header has the potential to be fragmented.
+ */
+ if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
+ struct iphdr *iph = skb->encapsulation ?
+ inner_ip_hdr(skb) : ip_hdr(skb);
+
+ if (!(iph->frag_off & htons(IP_DF)))
+ features &= ~NETIF_F_TSO_MANGLEID;
+ }
+
+ return features;
+}
+
netdev_features_t netif_skb_features(struct sk_buff *skb)
{
struct net_device *dev = skb->dev;
netdev_features_t features = dev->features;
- u16 gso_segs = skb_shinfo(skb)->gso_segs;
- if (gso_segs > dev->gso_max_segs)
- features &= ~NETIF_F_GSO_MASK;
+ if (skb_is_gso(skb))
+ features = gso_features_check(skb, dev, features);
/* If encapsulation offload request, verify we are testing
* hardware encapsulation features instead of standard
@@ -6976,9 +6998,11 @@ int register_netdevice(struct net_device *dev)
dev->features |= NETIF_F_SOFT_FEATURES;
dev->wanted_features = dev->features & dev->hw_features;
- if (!(dev->flags & IFF_LOOPBACK)) {
+ if (!(dev->flags & IFF_LOOPBACK))
dev->hw_features |= NETIF_F_NOCACHE_COPY;
- }
+
+ if (dev->hw_features & NETIF_F_TSO)
+ dev->hw_features |= NETIF_F_TSO_MANGLEID;
/* Make NETIF_F_HIGHDMA inheritable to VLAN devices.
*/
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 6a7f99661c2f..9494c41cc77c 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -79,6 +79,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
[NETIF_F_UFO_BIT] = "tx-udp-fragmentation",
[NETIF_F_GSO_ROBUST_BIT] = "tx-gso-robust",
[NETIF_F_TSO_ECN_BIT] = "tx-tcp-ecn-segmentation",
+ [NETIF_F_TSO_MANGLEID_BIT] = "tx-tcp-mangleid-segmentation",
[NETIF_F_TSO6_BIT] = "tx-tcp6-segmentation",
[NETIF_F_FSO_BIT] = "tx-fcoe-segmentation",
[NETIF_F_GSO_GRE_BIT] = "tx-gre-segmentation",
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8217cd22f921..5bbea9a0ce96 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1195,10 +1195,10 @@ EXPORT_SYMBOL(inet_sk_rebuild_header);
static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
netdev_features_t features)
{
+ bool udpfrag = false, fixedid = false, encap;
struct sk_buff *segs = ERR_PTR(-EINVAL);
const struct net_offload *ops;
unsigned int offset = 0;
- bool udpfrag, encap;
struct iphdr *iph;
int proto;
int nhoff;
@@ -1217,6 +1217,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
SKB_GSO_TCPV6 |
SKB_GSO_UDP_TUNNEL |
SKB_GSO_UDP_TUNNEL_CSUM |
+ SKB_GSO_TCP_FIXEDID |
SKB_GSO_TUNNEL_REMCSUM |
0)))
goto out;
@@ -1248,11 +1249,14 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
segs = ERR_PTR(-EPROTONOSUPPORT);
- if (skb->encapsulation &&
- skb_shinfo(skb)->gso_type & (SKB_GSO_SIT|SKB_GSO_IPIP))
- udpfrag = proto == IPPROTO_UDP && encap;
- else
- udpfrag = proto == IPPROTO_UDP && !skb->encapsulation;
+ if (!skb->encapsulation || encap) {
+ udpfrag = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
+ fixedid = !!(skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID);
+
+ /* fixed ID is invalid if DF bit is not set */
+ if (fixedid && !(iph->frag_off & htons(IP_DF)))
+ goto out;
+ }
ops = rcu_dereference(inet_offloads[proto]);
if (likely(ops && ops->callbacks.gso_segment))
@@ -1265,12 +1269,11 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
do {
iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
if (udpfrag) {
- iph->id = htons(id);
iph->frag_off = htons(offset >> 3);
if (skb->next)
iph->frag_off |= htons(IP_MF);
offset += skb->len - nhoff - ihl;
- } else {
+ } else if (!fixedid) {
iph->id = htons(id++);
}
iph->tot_len = htons(skb->len - nhoff);
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 6a5bd4317866..6376b0cdf693 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -32,6 +32,7 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
SKB_GSO_UDP |
SKB_GSO_DODGY |
SKB_GSO_TCP_ECN |
+ SKB_GSO_TCP_FIXEDID |
SKB_GSO_GRE |
SKB_GSO_GRE_CSUM |
SKB_GSO_IPIP |
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 773083b7f1e9..08dd25d835af 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -89,6 +89,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
~(SKB_GSO_TCPV4 |
SKB_GSO_DODGY |
SKB_GSO_TCP_ECN |
+ SKB_GSO_TCP_FIXEDID |
SKB_GSO_TCPV6 |
SKB_GSO_GRE |
SKB_GSO_GRE_CSUM |
@@ -98,7 +99,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
SKB_GSO_UDP_TUNNEL_CSUM |
SKB_GSO_TUNNEL_REMCSUM |
0) ||
- !(type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))))
+ !(type & (SKB_GSO_TCPV4 |
+ SKB_GSO_TCPV6))))
goto out;
skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss);
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 204af2219471..b3a779393d71 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -73,6 +73,8 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
SKB_GSO_UDP |
SKB_GSO_DODGY |
SKB_GSO_TCP_ECN |
+ SKB_GSO_TCP_FIXEDID |
+ SKB_GSO_TCPV6 |
SKB_GSO_GRE |
SKB_GSO_GRE_CSUM |
SKB_GSO_IPIP |
@@ -80,7 +82,6 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
SKB_GSO_UDP_TUNNEL |
SKB_GSO_UDP_TUNNEL_CSUM |
SKB_GSO_TUNNEL_REMCSUM |
- SKB_GSO_TCPV6 |
0)))
goto out;
diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index 0183b32da942..bbcf60465e5c 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -31,6 +31,7 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
SKB_GSO_TCPV6 |
SKB_GSO_UDP |
SKB_GSO_DODGY |
+ SKB_GSO_TCP_FIXEDID |
SKB_GSO_TCP_ECN)))
goto out;
^ permalink raw reply related
* [net-next PATCH v2 1/5] ethtool: Add support for toggling any of the GSO offloads
From: Alexander Duyck @ 2016-04-11 1:44 UTC (permalink / raw)
To: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem
In-Reply-To: <20160411013855.11189.56567.stgit@ahduyck-xeon-server>
The strings were missing for several of the GSO offloads that are
available. This patch provides the missing strings so that we can toggle
or query any of them via the ethtool command.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
net/core/ethtool.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f426c5ad6149..6a7f99661c2f 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -82,9 +82,11 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
[NETIF_F_TSO6_BIT] = "tx-tcp6-segmentation",
[NETIF_F_FSO_BIT] = "tx-fcoe-segmentation",
[NETIF_F_GSO_GRE_BIT] = "tx-gre-segmentation",
+ [NETIF_F_GSO_GRE_CSUM_BIT] = "tx-gre-csum-segmentation",
[NETIF_F_GSO_IPIP_BIT] = "tx-ipip-segmentation",
[NETIF_F_GSO_SIT_BIT] = "tx-sit-segmentation",
[NETIF_F_GSO_UDP_TUNNEL_BIT] = "tx-udp_tnl-segmentation",
+ [NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT] = "tx-udp_tnl-csum-segmentation",
[NETIF_F_FCOE_CRC_BIT] = "tx-checksum-fcoe-crc",
[NETIF_F_SCTP_CRC_BIT] = "tx-checksum-sctp",
^ permalink raw reply related
* [net-next PATCH v2 0/5] GRO Fixed IPv4 ID support and GSO partial support
From: Alexander Duyck @ 2016-04-11 1:44 UTC (permalink / raw)
To: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem
This patch series sets up a few different things.
First it adds support for GRO of frames with a fixed IP ID value. This
will allow us to perform GRO for frames that go through things like an IPv6
to IPv4 header translation.
The second item we add is support for segmenting frames that are generated
this way. Most devices only support an incrementing IP ID value, and in
the case of TCP the IP ID can be ignored in many cases since the DF bit
should be set. So we can technically segment these frames using existing
TSO if we are willing to allow the IP ID to be mangled. As such I have
added a matching feature for the new form of GRO/GSO called TCP IPv4 ID
mangling. With this enabled we can assemble and disassemble a frame with
the sequence number fixed and the only ill effect will be that the IPv4 ID
will be altered which may or may not have any noticeable effect. As such I
have defaulted the feature to disabled.
The third item this patch series adds is support for partial GSO
segmentation. Partial GSO segmentation allows us to split a large frame
into two pieces. The first piece will have an even multiple of MSS worth
of data and the headers before the one pointed to by csum_start will have
been updated so that they are correct for if the data payload had already
been segmented. By doing this we can do things such as precompute the
outer header checksums for a frame to be segmented allowing us to perform
TSO on devices that don't support tunneling, or tunneling with outer header
checksums.
This patch set is based on the net-next tree, but I included "net: remove
netdevice gso_min_segs" in my tree as I assume it is likely to be applied
before this patch set will and I wanted to avoid a merge conflict.
v2: Fixed items reported by Jesse Gross
fixed missing GSO flag in MPLS check
adding DF check for MANGLEID
Moved extra GSO feature checks into gso_features_check
Rebased batches to account for "net: remove netdevice gso_min_segs"
Driver patches from the first patch set should still be compatible. However
I do have a few changes in them so I will submit a v2 of those to Jeff
Kirsher once these patches are accepted into net-next.
Example driver patches for i40e, ixgbe, and igb:
https://patchwork.ozlabs.org/patch/608221/
https://patchwork.ozlabs.org/patch/608224/
https://patchwork.ozlabs.org/patch/608225/
---
Alexander Duyck (5):
ethtool: Add support for toggling any of the GSO offloads
GSO: Add GSO type for fixed IPv4 ID
GRO: Add support for TCP with fixed IPv4 ID field, limit tunnel IP ID values
GSO: Support partial segmentation offload
Documentation: Add documentation for TSO and GSO features
Documentation/networking/segmentation-offloads.txt | 130 ++++++++++++++++++++
include/linux/netdev_features.h | 8 +
include/linux/netdevice.h | 8 +
include/linux/skbuff.h | 27 +++-
net/core/dev.c | 67 +++++++++-
net/core/ethtool.c | 4 +
net/core/skbuff.c | 29 ++++
net/ipv4/af_inet.c | 70 ++++++++---
net/ipv4/gre_offload.c | 27 +++-
net/ipv4/tcp_offload.c | 30 ++++-
net/ipv4/udp_offload.c | 27 +++-
net/ipv6/ip6_offload.c | 21 +++
net/mpls/mpls_gso.c | 1
13 files changed, 395 insertions(+), 54 deletions(-)
create mode 100644 Documentation/networking/segmentation-offloads.txt
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox