* Re: [PATCH net] netvsc: reduce maximum GSO size
From: David Miller @ 2016-12-07 18:14 UTC (permalink / raw)
To: stephen; +Cc: netdev, sthemmin
In-Reply-To: <20161206214354.15473-1-sthemmin@microsoft.com>
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Tue, 6 Dec 2016 13:43:54 -0800
> Hyper-V (and Azure) support using NVGRE which requires some extra space
> for encapsulation headers. Because of this the largest allowed TSO
> packet is reduced.
>
> For older releases, hard code a fixed reduced value. For next release,
> there is a better solution which uses result of host offload
> negotiation.
>
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> ---
> Please queue this for stable as well.
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH v2] drivers: net: cpsw-phy-sel: Clear RGMII_IDMODE on "rgmii" links
From: David Miller @ 2016-12-07 18:12 UTC (permalink / raw)
To: alex.g
Cc: mugunthanvnm, grygorii.strashko, linux-omap, netdev, linux-kernel,
gokhan
In-Reply-To: <1481050611-29698-1-git-send-email-alex.g@adaptrum.com>
From: Alexandru Gagniuc <alex.g@adaptrum.com>
Date: Tue, 6 Dec 2016 10:56:51 -0800
> Support for setting the RGMII_IDMODE bit was added in the commit
> referenced below. However, that commit did not add the symmetrical
> clearing of the bit by way of setting it in "mask". Add it here.
>
> Note that the documentation marks clearing this bit as "reserved",
> however, according to TI, support for delaying the clock does exist in
> the MAC, although it is not officially supported.
> We tested this on a board with an RGMII to RGMII link that will not
> work unless this bit is cleared.
>
> Fixes: 0fb26c3063ea ("drivers: net: cpsw-phy-sel: add support to configure rgmii internal delay")
> Signed-off-by: Alexandru Gagniuc <alex.g@adaptrum.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: stmmac: do not call phy_ethtool_ksettings_set from atomic context
From: David Miller @ 2016-12-07 18:09 UTC (permalink / raw)
To: niklas.cassel
Cc: peppe.cavallaro, alexandre.torgue, niklass, netdev, linux-kernel
In-Reply-To: <1481032035-19018-1-git-send-email-niklass@axis.com>
From: Niklas Cassel <niklas.cassel@axis.com>
Date: Tue, 6 Dec 2016 14:47:15 +0100
> From: Niklas Cassel <niklas.cassel@axis.com>
>
> From what I can tell, spin_lock(&priv->lock) is not needed, since the
> phy_ethtool_ksettings_set call is not given the priv struct.
>
> phy_start_aneg takes the phydev->lock. Calls to phy_adjust_link
> from phy_state_machine also takes the phydev->lock.
...
> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Applied, but please always be explicit about what tree you are targetting
this patch by properly annotating for it in your Subject line.
In this case that would be "Subject: [PATCH net-next] ..."
^ permalink raw reply
* stmmac driver...
From: David Miller @ 2016-12-07 18:06 UTC (permalink / raw)
To: peppe.cavallaro; +Cc: alexandre.torgue, netdev
Giuseppe and Alexandre,
There are a lot of patches and discussions happening around the stammc
driver lately and both of you are listed as the maintainers.
I really need prompt and conclusive reviews of these patch submissions
from you, and participation in all discussions about the driver.
Otherwise I have only three things I can do: 1) let the patches rot in
patchwork for days 2) trust that the patches are sane and fit your
desires and goals and just apply them or 3) reject them since they
aren't being reviewed properly.
Thanks in advance.
^ permalink raw reply
* [PATCH net-next] net: do not read sk_drops if application does not care
From: Eric Dumazet @ 2016-12-07 18:05 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Paolo Abeni
From: Eric Dumazet <edumazet@google.com>
sk_drops can be an often written field, do not read it unless
application showed interest.
Note that sk_drops can be read via inet_diag, so applications
can avoid getting this info from every received packet.
In the future, 'reading' sk_drops might require folding per node or per
cpu fields, and thus become even more expensive than today.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
---
include/net/sock.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 1749e38d03014558ac882b5d1fb37b11ac5e6705..be167c1483f4a5a74b466f135bbfdf4281e5bef4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2160,7 +2160,8 @@ struct sock_skb_cb {
static inline void
sock_skb_set_dropcount(const struct sock *sk, struct sk_buff *skb)
{
- SOCK_SKB_CB(skb)->dropcount = atomic_read(&sk->sk_drops);
+ SOCK_SKB_CB(skb)->dropcount = sock_flag(sk, SOCK_RXQ_OVFL) ?
+ atomic_read(&sk->sk_drops) : 0;
}
static inline void sk_drops_add(struct sock *sk, const struct sk_buff *skb)
^ permalink raw reply related
* [PATCH] net: pch_gbe: Fix TX RX descriptor accesses for big endian systems
From: Hassan Naveed @ 2016-12-07 17:58 UTC (permalink / raw)
To: netdev
Cc: Hassan Naveed, Paul Burton, Matt Redfearn, David S. Miller,
Florian Westphal, françois romieu
Fix pch_gbe driver for ethernet operations for a big endian CPU.
Values written to and read from transmit and receive descriptors
in the pch_gbe driver are byte swapped from the perspective of a
big endian CPU, since the ethernet controller always operates in
little endian mode. Rectify this by appropriately byte swapping
these descriptor field values in the driver software.
Signed-off-by: Hassan Naveed <hassan.naveed@imgtec.com>
Reviewed-by: Paul Burton <paul.burton@imgtec.com>
Reviewed-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Westphal <fw@strlen.de>
Cc: françois romieu <romieu@fr.zoreil.com>
---
.../net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 66 ++++++++++++----------
1 file changed, 35 insertions(+), 31 deletions(-)
diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index d1048dd..6937169 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -1250,11 +1250,11 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter,
/*-- Set Tx descriptor --*/
tx_desc = PCH_GBE_TX_DESC(*tx_ring, ring_num);
- tx_desc->buffer_addr = (buffer_info->dma);
- tx_desc->length = skb->len;
- tx_desc->tx_words_eob = skb->len + 3;
- tx_desc->tx_frame_ctrl = (frame_ctrl);
- tx_desc->gbec_status = (DSC_INIT16);
+ tx_desc->buffer_addr = cpu_to_le32(buffer_info->dma);
+ tx_desc->length = cpu_to_le16(skb->len);
+ tx_desc->tx_words_eob = cpu_to_le16(skb->len + 3);
+ tx_desc->tx_frame_ctrl = cpu_to_le16(frame_ctrl);
+ tx_desc->gbec_status = cpu_to_le16(DSC_INIT16);
if (unlikely(++ring_num == tx_ring->count))
ring_num = 0;
@@ -1460,8 +1460,8 @@ static irqreturn_t pch_gbe_intr(int irq, void *data)
}
buffer_info->mapped = true;
rx_desc = PCH_GBE_RX_DESC(*rx_ring, i);
- rx_desc->buffer_addr = (buffer_info->dma);
- rx_desc->gbec_status = DSC_INIT16;
+ rx_desc->buffer_addr = cpu_to_le32(buffer_info->dma);
+ rx_desc->gbec_status = cpu_to_le16(DSC_INIT16);
netdev_dbg(netdev,
"i = %d buffer_info->dma = 0x08%llx buffer_info->length = 0x%x\n",
@@ -1533,7 +1533,7 @@ static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter,
skb_reserve(skb, PCH_GBE_DMA_ALIGN);
buffer_info->skb = skb;
tx_desc = PCH_GBE_TX_DESC(*tx_ring, i);
- tx_desc->gbec_status = (DSC_INIT16);
+ tx_desc->gbec_status = cpu_to_le16(DSC_INIT16);
}
return;
}
@@ -1564,11 +1564,12 @@ static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter,
i = tx_ring->next_to_clean;
tx_desc = PCH_GBE_TX_DESC(*tx_ring, i);
netdev_dbg(adapter->netdev, "gbec_status:0x%04x dma_status:0x%04x\n",
- tx_desc->gbec_status, tx_desc->dma_status);
+ le16_to_cpu(tx_desc->gbec_status), tx_desc->dma_status);
unused = PCH_GBE_DESC_UNUSED(tx_ring);
thresh = tx_ring->count - PCH_GBE_TX_WEIGHT;
- if ((tx_desc->gbec_status == DSC_INIT16) && (unused < thresh))
+ if ((le16_to_cpu(tx_desc->gbec_status) == DSC_INIT16) &&
+ (unused < thresh))
{ /* current marked clean, tx queue filling up, do extra clean */
int j, k;
if (unused < 8) { /* tx queue nearly full */
@@ -1583,47 +1584,49 @@ static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter,
for (j = 0; j < PCH_GBE_TX_WEIGHT; j++)
{
tx_desc = PCH_GBE_TX_DESC(*tx_ring, k);
- if (tx_desc->gbec_status != DSC_INIT16) break; /*found*/
+ if (le16_to_cpu(tx_desc->gbec_status) != DSC_INIT16)
+ break; /*found*/
if (++k >= tx_ring->count) k = 0; /*increment, wrap*/
}
if (j < PCH_GBE_TX_WEIGHT) {
netdev_dbg(adapter->netdev,
"clean_tx: unused=%d loops=%d found tx_desc[%x,%x:%x].gbec_status=%04x\n",
unused, j, i, k, tx_ring->next_to_use,
- tx_desc->gbec_status);
+ le16_to_cpu(tx_desc->gbec_status));
i = k; /*found one to clean, usu gbec_status==2000.*/
}
}
- while ((tx_desc->gbec_status & DSC_INIT16) == 0x0000) {
+ while ((cpu_to_le16(tx_desc->gbec_status) & DSC_INIT16) == 0x0000) {
netdev_dbg(adapter->netdev, "gbec_status:0x%04x\n",
- tx_desc->gbec_status);
+ le16_to_cpu(tx_desc->gbec_status));
buffer_info = &tx_ring->buffer_info[i];
skb = buffer_info->skb;
cleaned = true;
- if ((tx_desc->gbec_status & PCH_GBE_TXD_GMAC_STAT_ABT)) {
+ if ((le16_to_cpu(tx_desc->gbec_status) &
+ PCH_GBE_TXD_GMAC_STAT_ABT)) {
adapter->stats.tx_aborted_errors++;
netdev_err(adapter->netdev, "Transfer Abort Error\n");
- } else if ((tx_desc->gbec_status & PCH_GBE_TXD_GMAC_STAT_CRSER)
- ) {
+ } else if ((le16_to_cpu(tx_desc->gbec_status) &
+ PCH_GBE_TXD_GMAC_STAT_CRSER)) {
adapter->stats.tx_carrier_errors++;
netdev_err(adapter->netdev,
"Transfer Carrier Sense Error\n");
- } else if ((tx_desc->gbec_status & PCH_GBE_TXD_GMAC_STAT_EXCOL)
- ) {
+ } else if ((le16_to_cpu(tx_desc->gbec_status) &
+ PCH_GBE_TXD_GMAC_STAT_EXCOL)) {
adapter->stats.tx_aborted_errors++;
netdev_err(adapter->netdev,
"Transfer Collision Abort Error\n");
- } else if ((tx_desc->gbec_status &
+ } else if ((le16_to_cpu(tx_desc->gbec_status) &
(PCH_GBE_TXD_GMAC_STAT_SNGCOL |
PCH_GBE_TXD_GMAC_STAT_MLTCOL))) {
adapter->stats.collisions++;
adapter->stats.tx_packets++;
adapter->stats.tx_bytes += skb->len;
netdev_dbg(adapter->netdev, "Transfer Collision\n");
- } else if ((tx_desc->gbec_status & PCH_GBE_TXD_GMAC_STAT_CMPLT)
- ) {
+ } else if ((le16_to_cpu(tx_desc->gbec_status) &
+ PCH_GBE_TXD_GMAC_STAT_CMPLT)) {
adapter->stats.tx_packets++;
adapter->stats.tx_bytes += skb->len;
}
@@ -1639,7 +1642,7 @@ static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter,
"trim buffer_info->skb : %d\n", i);
skb_trim(buffer_info->skb, 0);
}
- tx_desc->gbec_status = DSC_INIT16;
+ tx_desc->gbec_status = cpu_to_le16(DSC_INIT16);
if (unlikely(++i == tx_ring->count))
i = 0;
tx_desc = PCH_GBE_TX_DESC(*tx_ring, i);
@@ -1705,15 +1708,15 @@ static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter,
while (*work_done < work_to_do) {
/* Check Rx descriptor status */
rx_desc = PCH_GBE_RX_DESC(*rx_ring, i);
- if (rx_desc->gbec_status == DSC_INIT16)
+ if (le16_to_cpu(rx_desc->gbec_status) == DSC_INIT16)
break;
cleaned = true;
cleaned_count++;
dma_status = rx_desc->dma_status;
- gbec_status = rx_desc->gbec_status;
- tcp_ip_status = rx_desc->tcp_ip_status;
- rx_desc->gbec_status = DSC_INIT16;
+ gbec_status = le16_to_cpu(rx_desc->gbec_status);
+ tcp_ip_status = le32_to_cpu(rx_desc->tcp_ip_status);
+ rx_desc->gbec_status = cpu_to_le16(DSC_INIT16);
buffer_info = &rx_ring->buffer_info[i];
skb = buffer_info->skb;
buffer_info->skb = NULL;
@@ -1742,8 +1745,9 @@ static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter,
} else {
/* get receive length */
/* length convert[-3], length includes FCS length */
- length = (rx_desc->rx_words_eob) - 3 - ETH_FCS_LEN;
- if (rx_desc->rx_words_eob & 0x02)
+ length = le16_to_cpu(rx_desc->rx_words_eob) - 3 -
+ ETH_FCS_LEN;
+ if (le16_to_cpu(rx_desc->rx_words_eob) & 0x02)
length = length - 4;
/*
* buffer_info->rx_buffer: [Header:14][payload]
@@ -1823,7 +1827,7 @@ int pch_gbe_setup_tx_resources(struct pch_gbe_adapter *adapter,
for (desNo = 0; desNo < tx_ring->count; desNo++) {
tx_desc = PCH_GBE_TX_DESC(*tx_ring, desNo);
- tx_desc->gbec_status = DSC_INIT16;
+ tx_desc->gbec_status = cpu_to_le16(DSC_INIT16);
}
netdev_dbg(adapter->netdev,
"tx_ring->desc = 0x%p tx_ring->dma = 0x%08llx next_to_clean = 0x%08x next_to_use = 0x%08x\n",
@@ -1864,7 +1868,7 @@ int pch_gbe_setup_rx_resources(struct pch_gbe_adapter *adapter,
rx_ring->next_to_use = 0;
for (desNo = 0; desNo < rx_ring->count; desNo++) {
rx_desc = PCH_GBE_RX_DESC(*rx_ring, desNo);
- rx_desc->gbec_status = DSC_INIT16;
+ rx_desc->gbec_status = cpu_to_le16(DSC_INIT16);
}
netdev_dbg(adapter->netdev,
"rx_ring->desc = 0x%p rx_ring->dma = 0x%08llx next_to_clean = 0x%08x next_to_use = 0x%08x\n",
--
1.9.1
^ permalink raw reply related
* Re: [PATCH] net: return value of skb_linearize should be handled in Linux kernel
From: Cong Wang @ 2016-12-07 17:57 UTC (permalink / raw)
To: Zhouyi Zhou
Cc: faisal.latif, dledford, sean.hefty, Hal Rosenstock, Jeff Kirsher,
QLogic-Storage-Upstream, jejb, Martin K. Petersen,
Johannes Thumshirn, jon.maloy, ying.xue, David Miller, linux-rdma,
LKML, intel-wired-lan, Linux Kernel Network Developers,
linux-scsi, fcoe-devel, tipc-discussion
In-Reply-To: <CAABZP2w_YsXgzZ0tH=r3mnLUbVfpY8Xn3vk9XoS_Hq1r8aeNUQ@mail.gmail.com>
On Tue, Dec 6, 2016 at 10:27 PM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> On Wed, Dec 7, 2016 at 1:02 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Mon, Dec 5, 2016 at 11:10 PM, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
>>> index 2a653ec..ab787cb 100644
>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
>>> @@ -490,7 +490,11 @@ int ixgbe_fcoe_ddp(struct ixgbe_adapter *adapter,
>>> */
>>> if ((fh->fh_r_ctl == FC_RCTL_DD_SOL_DATA) &&
>>> (fctl & FC_FC_END_SEQ)) {
>>> - skb_linearize(skb);
>>> + int err = 0;
>>> +
>>> + err = skb_linearize(skb);
>>> + if (err)
>>> + return err;
>>
>>
>> You can reuse 'rc' instead of adding 'err'.
> rc here is meaningful for the length of data being ddped. If using rc
> here, a successful
> skb_linearize will assign rc to 0.
Right, I thought it returns 0 on success.
>>
>>
>>
>>> crc = (struct fcoe_crc_eof *)skb_put(skb, sizeof(*crc));
>>> crc->fcoe_eof = FC_EOF_T;
>>> }
>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>> index fee1f29..4926d48 100644
>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>> @@ -2173,8 +2173,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>>> total_rx_bytes += ddp_bytes;
>>> total_rx_packets += DIV_ROUND_UP(ddp_bytes,
>>> mss);
>>> - }
>>> - if (!ddp_bytes) {
>>> + } else {
>>> dev_kfree_skb_any(skb);
>>> continue;
>>> }
>>
>>
>> This piece doesn't seem to be related.
> if ddp_bytes is negative there will be some error, I think the skb
> should not pass to upper layer.
You misunderstand my point, this return value is for ixgbe_fcoe_ddp()
not skb_linearize(), you need to make it a separate patch because this
patch, as in $subject, only fixes skb_linearize().
^ permalink raw reply
* Re: [PATCH] net/udp: do not touch skb->peeked unless really needed
From: Eric Dumazet @ 2016-12-07 17:55 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: David Laight, Paolo Abeni, David Miller, netdev, Willem de Bruijn
In-Reply-To: <1481132267.1541189.811630457.167E9C56@webmail.messagingengine.com>
On Wed, 2016-12-07 at 18:37 +0100, Hannes Frederic Sowa wrote:
> I had the same idea while discussing that with Paolo, merely using an
> *atomic_t = kmalloc(sizeof(atomic_t)) out of band of the socket.
>
> My fear was that those could be aggregated by the slab cache into one
> cache line, causing even more heating on cachelines.
My exact idea was to let up to 4095 (or PAGE_SIZE - 1) increments being
done on the counter before switching to dynamically allocated memory.
( Some packets might be dropped by TCP sockets, not necessarily a sign
of an attack. just some spurious retransmits )
^ permalink raw reply
* Re: [PATCH] net/udp: do not touch skb->peeked unless really needed
From: Eric Dumazet @ 2016-12-07 17:52 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: David Laight, Paolo Abeni, David Miller, netdev, Willem de Bruijn
In-Reply-To: <1481132267.1541189.811630457.167E9C56@webmail.messagingengine.com>
On Wed, 2016-12-07 at 18:37 +0100, Hannes Frederic Sowa wrote:
> I had the same idea while discussing that with Paolo, merely using an
> *atomic_t = kmalloc(sizeof(atomic_t)) out of band of the socket.
>
> My fear was that those could be aggregated by the slab cache into one
> cache line, causing even more heating on cachelines.
For hot stuff, better use kmalloc(max_t(size_t,
L1_CACHE_BYTES,
sizeof(...))
to avoid false sharing, unless this is per cpu data of course.
^ permalink raw reply
* Re: [PATCH net-next] of: add optional naming of interfaces
From: Florian Fainelli @ 2016-12-07 17:49 UTC (permalink / raw)
To: Volodymyr Bendiuga, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
mark.rutland-5wv7dgnIgG8, frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
volodymyr.bendiuga-Re5JQEeQqe8AvxtiuMwx3w
Cc: Jonas Johansson, Mattias Walström
In-Reply-To: <1481116349-20678-1-git-send-email-volodymyr.bendiuga-qeDNsGSBLoYwFerOooGFRg@public.gmane.org>
On 12/07/2016 05:12 AM, Volodymyr Bendiuga wrote:
> From: Jonas Johansson <jonas.johansson-qeDNsGSBLoYwFerOooGFRg@public.gmane.org>
>
> Signed-off-by: Mattias Walström <lazzer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Jonas Johansson <jonas.johansson-qeDNsGSBLoYwFerOooGFRg@public.gmane.org>
This does not belong to the Device Tree, there should be plenty of
information in user-space to make an educated device rename. I
definitively understand that some drivers (e.g: dsa) do actually get
their interface name from Device Tree directly (label property), but
this is probably the one and only case where this may be tolerated.
Besides, if you submit such a change, you would want to also provide a
consumer of that API to illustrate how this is used.
--
Florian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: commit : ppp: add rtnetlink device creation support - breaks netcf on my machine.
From: Thomas Haller @ 2016-12-07 17:43 UTC (permalink / raw)
To: Dan Williams, Guillaume Nault, Brad Campbell
Cc: netdev, Thomas Graf, David Miller
In-Reply-To: <1481065966.11028.3.camel@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 642 bytes --]
On Tue, 2016-12-06 at 17:12 -0600, Dan Williams wrote:
>
> > libnl1 rejects the IFLA_INFO_DATA attribute because it expects it
> > to
> > contain a sub-attribute. Since the payload size is zero it doesn't
> > match the policy and parsing fails.
> >
> > There's no problem with libnl3 because its policy accepts empty
> > payloads for NLA_NESTED attributes (see libnl3 commit 4be02ace4826
Hi,
libnl1 is unmaintained these days. I don't think it makes sense to
backport that patch. The last upstream release was 3+ years ago, with
no upstream development since then.
IMHO netcf should drop libnl-1 support.
best,
Thomas
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH] net/udp: do not touch skb->peeked unless really needed
From: Hannes Frederic Sowa @ 2016-12-07 17:37 UTC (permalink / raw)
To: Eric Dumazet, David Laight
Cc: Paolo Abeni, David Miller, netdev, Willem de Bruijn
In-Reply-To: <1481131924.4930.40.camel@edumazet-glaptop3.roam.corp.google.com>
On Wed, Dec 7, 2016, at 18:32, Eric Dumazet wrote:
> On Wed, 2016-12-07 at 17:09 +0000, David Laight wrote:
> > From: Paolo Abeni
> > > Sent: 06 December 2016 17:08
> > ...
> > > @@ -79,6 +82,9 @@ struct udp_sock {
> > > int (*gro_complete)(struct sock *sk,
> > > struct sk_buff *skb,
> > > int nhoff);
> > > +
> > > + /* since we are prone to drops, avoid dirtying any sk cacheline */
> > > + atomic_t drops ____cacheline_aligned_in_smp;
> > > };
> >
> > Isn't that likely to create a large hole on systems with large cache lines.
> > (Same as any other use of ____cacheline_aligned_in_smp.)
>
> Yes, I would like to avoid that, unless we come to the conclusion it is
> absolutely needed.
>
> I feel that we could simply use a pointer, and allocate memory on
> demand, since many sockets do not ever experience a drop.
>
> The pointer could stay in a read mostly section.
>
> We even could use per cpu or node counter for some heavy drop cases.
I had the same idea while discussing that with Paolo, merely using an
*atomic_t = kmalloc(sizeof(atomic_t)) out of band of the socket.
My fear was that those could be aggregated by the slab cache into one
cache line, causing even more heating on cachelines.
Bye,
Hannes
^ permalink raw reply
* Re: 4.9.0-rc8: tg3 dead after resume
From: Michael Chan @ 2016-12-07 17:37 UTC (permalink / raw)
To: Billy Shuman; +Cc: Netdev, Siva Reddy Kallam
In-Reply-To: <CAHQNsodiku6Ln3y-=GzmmNLM0Emc2rEheFmc80OCuN91roojqA@mail.gmail.com>
On Wed, Dec 7, 2016 at 7:20 AM, Billy Shuman <wshuman3@gmail.com> wrote:
> After resume on 4.9.0-rc8 tg3 is dead.
>
> In logs I see:
> kernel: tg3 0000:44:00.0: phy probe failed, err -19
> kernel: tg3 0000:44:00.0: Problem fetching invariants of chip, aborting
-19 is -ENODEV which means tg3 cannot read the PHY ID.
If it's a true suspend/resume operation, the driver does not have to
go through probe during resume. Please explain how you do
suspend/resume.
Did this work before? There has been very few changes to tg3 recently.
>
> rmmod and modprobe does not fix the problem only a reboot resolves the issue.
>
> Billy
^ permalink raw reply
* Re: [net-next][PATCH v2 18/18] RDS: IB: add missing connection cache usage info
From: David Miller @ 2016-12-07 17:36 UTC (permalink / raw)
To: santosh.shilimkar; +Cc: netdev, linux-kernel
In-Reply-To: <6159757e-c799-ee7f-0ee0-c9b3534a4237@oracle.com>
From: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Date: Wed, 7 Dec 2016 09:20:17 -0800
> Newer/Updated tools which can parse this extra info in needs newer
> or an updated kernel which supports and populates these fields.
>
> As mentioned, this particular option used only in verbose mode so
> am ok to drop this change if its still a concern.
What does the newer tool do on an older kernel if it doesn't see
the fields? Does it check the size of the structure given back
to it, and conditionally handle the older vs. the newer layout?
It must do this.
^ permalink raw reply
* Re: [PATCH] net/udp: do not touch skb->peeked unless really needed
From: Eric Dumazet @ 2016-12-07 17:32 UTC (permalink / raw)
To: David Laight
Cc: 'Paolo Abeni', David Miller, netdev, Willem de Bruijn
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB0237532@AcuExch.aculab.com>
On Wed, 2016-12-07 at 17:09 +0000, David Laight wrote:
> From: Paolo Abeni
> > Sent: 06 December 2016 17:08
> ...
> > @@ -79,6 +82,9 @@ struct udp_sock {
> > int (*gro_complete)(struct sock *sk,
> > struct sk_buff *skb,
> > int nhoff);
> > +
> > + /* since we are prone to drops, avoid dirtying any sk cacheline */
> > + atomic_t drops ____cacheline_aligned_in_smp;
> > };
>
> Isn't that likely to create a large hole on systems with large cache lines.
> (Same as any other use of ____cacheline_aligned_in_smp.)
Yes, I would like to avoid that, unless we come to the conclusion it is
absolutely needed.
I feel that we could simply use a pointer, and allocate memory on
demand, since many sockets do not ever experience a drop.
The pointer could stay in a read mostly section.
We even could use per cpu or node counter for some heavy drop cases.
^ permalink raw reply
* Re: [PATCH 1/1] ixgbe: fcoe: return value of skb_linearize should be handled
From: Jeff Kirsher @ 2016-12-07 17:30 UTC (permalink / raw)
To: Zhouyi Zhou, intel-wired-lan, netdev, linux-kernel; +Cc: Zhouyi Zhou
In-Reply-To: <1481096614-25295-1-git-send-email-zhouzhouyi@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 732 bytes --]
On Wed, 2016-12-07 at 15:43 +0800, Zhouyi Zhou wrote:
> Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c | 6 +++++-
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 +--
> 2 files changed, 6 insertions(+), 3 deletions(-)
Did Cong, Yuval and Eric give their Reviewed-by offline? I see they made
comments and suggests, but never saw them actually give you their Reviewed-
by. You cannot automatically add their Reviewed-by, Signed-off-by, etc
just because someone provides feedback on your patch.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* nfct_query hangs after multiple requests
From: Kirila Adamova @ 2016-12-07 17:27 UTC (permalink / raw)
To: netdev@vger.kernel.org
Hi
I am using nfct_query (libnetfilter_conntrack library) to get a connection from the conntrack table and then to update its connmark. This was working ok in a development environment, but when testing it in production with a lot of traffic, after around a minute, the daemon hangs on nfct_query and does not process any more data.
Some background:
- I am sending packets via NFLOG to the daemon (and setting a connmark 0x2/0x2)
- the daemon polls the NFLOG group and handles the packets via nflog_handle_packet
- the callback registered with the nflog handle extracts the conntrack information from the packet header (L4 proto, src/dst ip, src/dst port)
- an nf_conntrack pointer is created with this information
- (calling another library which calls another callback)
- if certain conditions are met
-- register nfct callback -- nfct_callback_register(h, NFCT_T_ALL, my_nfct_callback, h)
-- nfct_query with NFCT_Q_GET to get the conntrack connection based on the ct data
-- (in the nfct callback) check the connmark of the connection and run nfct query with NFCT_Q_UPDATE to update the connmark of that same connection
The nfct_handle is opened at the start of the daemon and closed via signal handling at termination.
After placing some debug prints in the code, I discovered that at some point nfct_query for NFCT_Q_GET is called, but it never enters the callback function.
Debugging with strace showed the following:
...
recvfrom(4,"$\0\0\0\2\0\0\0h\4IX\22(\0\0\0\0\0\0\304\0\0\0\0\1\5\0h\4IX"..., 8192, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
sendto(4,"", 0, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
recvfrom(4,
--- and is hanging here.
I am writing to this mailing list, hoping that somebody would have an idea how to proceed with debugging and what the issue might be. Obviously, it's the amount of connections. But there must be a way to handle them for longer than a minute. Once it hangs, it never resumes.
Please let me know if you need any further information or part of the code.
Versions used:
libnetfilter_conntrack - 1.0.4
libnetfilter_log - 1.0.1
Best regards
Kirila
^ permalink raw reply
* Re: [PATCH v3 net-next 1/4] bpf: xdp: Allow head adjustment in XDP prog
From: Martin KaFai Lau @ 2016-12-07 17:26 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Jakub Kicinski, Daniel Borkmann, netdev, Alexei Starovoitov,
Brenden Blanco, David Miller, Jesper Dangaard Brouer,
John Fastabend, Saeed Mahameed, Tariq Toukan, Kernel Team
In-Reply-To: <20161207163756.GA33446@ast-mbp.thefacebook.com>
On Wed, Dec 07, 2016 at 08:37:58AM -0800, Alexei Starovoitov wrote:
> On Wed, Dec 07, 2016 at 11:41:12AM +0000, Jakub Kicinski wrote:
> > > I see nothing wrong if this is exposed/made visible in the usual way through
> > > ethtool -k as well. I guess at least that would be the expected way to query
> > > for such driver capabilities.
> >
> > +1 on exposing this to user space. Whether via ethtool -k or a
> > separate XDP-specific netlink message is mostly a question of whether
> > we expect the need to expose more complex capabilities than bits.
>
> I'm very much against using NETIF_F_ flags and exposing this to user space.
> I see this xdp feature flag as temporary workaround until all drivers
> support adjust_head() helper. It is very much a fundamental feature for xdp.
> Without being able to add/remove headers the usability of xdp becomes very limited.
>
> If you guys dont like extra ndo_xdp command, I'd suggest to do
> "if (prog->xdp_adjust_head)" check in the driver and if driver doesn't
> support it yet, just fail XDP_SETUP_PROG command.
> imo that will be more flexible interface, since in the future drivers
> can fail on different combination of features and simple boolean flag
> unlikely to serve us for long time.
It makes sense that adjust_head() will eventually be supported by
all xdp-capable driver. If that is the case, lets check
prog->xdp_adjust_head inside the driver instead of adding
another ndo_xdp command which will become unuseful very soon.
^ permalink raw reply
* Re: [PATCH v2 iproute2/net-next 0/3] tc: flower: Support matching on ICMP
From: Stephen Hemminger @ 2016-12-07 17:21 UTC (permalink / raw)
To: Simon Horman; +Cc: netdev, Jiri Pirko
In-Reply-To: <1481118843-10428-1-git-send-email-simon.horman@netronome.com>
On Wed, 7 Dec 2016 14:54:00 +0100
Simon Horman <simon.horman@netronome.com> wrote:
> Add support for matching on ICMP type and code to flower. This is modeled
> on existing support for matching on L4 ports.
>
> The second patch provided a minor cleanup which is in keeping with
> they style used in the last patch.
>
> This is marked as an RFC to match the same designation given to the
> corresponding kernel patches.
>
>
> Changes since v1:
> * Rebase
> * Do not run noths() on u8 entity
>
> Simon Horman (3):
> tc: flower: update headers for TCA_FLOWER_KEY_ICMP*
> tc: flower: introduce enum flower_endpoint
> tc: flower: support matching on ICMP type and code
>
> include/linux/pkt_cls.h | 10 ++++
> man/man8/tc-flower.8 | 20 ++++++--
> tc/f_flower.c | 123 +++++++++++++++++++++++++++++++++++++++++++-----
> 3 files changed, 135 insertions(+), 18 deletions(-)
>
I am holding of applying these to net-next until David applies kernel
portion.
^ permalink raw reply
* [PATCH net-next] udp: under rx pressure, try to condense skbs
From: Eric Dumazet @ 2016-12-07 17:19 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Paolo Abeni
From: Eric Dumazet <edumazet@google.com>
Under UDP flood, many softirq producers try to add packets to
UDP receive queue, and one user thread is burning one cpu trying
to dequeue packets as fast as possible.
Two parts of the per packet cost are :
- copying payload from kernel space to user space,
- freeing memory pieces associated with skb.
If socket is under pressure, softirq handler(s) can try to pull in
skb->head the payload of the packet if it fits.
Meaning the softirq handler(s) can free/reuse the page fragment
immediately, instead of letting udp_recvmsg() do this hundreds of usec
later, possibly from another node.
Additional gains :
- We reduce skb->truesize and thus can store more packets per SO_RCVBUF
- We avoid cache line misses at copyout() time and consume_skb() time,
and avoid one put_page() with potential alien freeing on NUMA hosts.
This comes at the cost of a copy, bounded to available tail room, which
is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger
than necessary)
This patch gave me about 5 % increase in throughput in my tests.
skb_condense() helper could probably used in other contexts.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
---
include/linux/skbuff.h | 2 ++
net/core/skbuff.c | 28 ++++++++++++++++++++++++++++
net/ipv4/udp.c | 12 +++++++++++-
3 files changed, 41 insertions(+), 1 deletion(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9c535fbccf2c7dbfae04cee393460e86d588c26b..0cd92b0f2af5fe5a7c153435b8dc758338180ae3 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1966,6 +1966,8 @@ static inline int pskb_may_pull(struct sk_buff *skb, unsigned int len)
return __pskb_pull_tail(skb, len - skb_headlen(skb)) != NULL;
}
+void skb_condense(struct sk_buff *skb);
+
/**
* skb_headroom - bytes at buffer head
* @skb: buffer to check
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b45cd1494243fc99686016949f4546dbba11f424..84151cf40aebb973bad5bee3ee4be0758084d83c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4931,3 +4931,31 @@ struct sk_buff *pskb_extract(struct sk_buff *skb, int off,
return clone;
}
EXPORT_SYMBOL(pskb_extract);
+
+/**
+ * skb_condense - try to get rid of fragments/frag_list if possible
+ * @skb: buffer
+ *
+ * Can be used to save memory before skb is added to a busy queue.
+ * If packet has bytes in frags and enough tail room in skb->head,
+ * pull all of them, so that we can free the frags right now and adjust
+ * truesize.
+ * Notes:
+ * We do not reallocate skb->head thus can not fail.
+ * Caller must re-evaluate skb->truesize if needed.
+ */
+void skb_condense(struct sk_buff *skb)
+{
+ if (!skb->data_len ||
+ skb->data_len > skb->end - skb->tail ||
+ skb_cloned(skb))
+ return;
+
+ /* Nice, we can free page frag(s) right now */
+ __pskb_pull_tail(skb, skb->data_len);
+
+ /* Now adjust skb->truesize, since __pskb_pull_tail() does
+ * not do this.
+ */
+ skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
+}
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 16d88ba9ff1c402f77063cfb5eea2708d86da2fc..f5628ada47b53f0d92d08210e5d7e4132a107f73 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1199,7 +1199,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
{
struct sk_buff_head *list = &sk->sk_receive_queue;
int rmem, delta, amt, err = -ENOMEM;
- int size = skb->truesize;
+ int size;
/* try to avoid the costly atomic add/sub pair when the receive
* queue is full; always allow at least a packet
@@ -1208,6 +1208,16 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
if (rmem > sk->sk_rcvbuf)
goto drop;
+ /* Under mem pressure, it might be helpful to help udp_recvmsg()
+ * having linear skbs :
+ * - Reduce memory overhead and thus increase receive queue capacity
+ * - Less cache line misses at copyout() time
+ * - Less work at consume_skb() (less alien page frag freeing)
+ */
+ if (rmem > (sk->sk_rcvbuf >> 1))
+ skb_condense(skb);
+ size = skb->truesize;
+
/* we drop only if the receive buf is full and the receive
* queue contains some other skb
*/
^ permalink raw reply related
* Re: [net-next][PATCH v2 18/18] RDS: IB: add missing connection cache usage info
From: Santosh Shilimkar @ 2016-12-07 17:20 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20161207.120536.1153607792891600896.davem@davemloft.net>
On 12/7/2016 9:05 AM, David Miller wrote:
> From: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> Date: Wed, 7 Dec 2016 08:44:04 -0800
>
>> On 12/7/2016 7:55 AM, David Miller wrote:
>>> From: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>>> Date: Tue, 6 Dec 2016 20:01:56 -0800
>>>
>>> What level of compatability exists here? If we run an old tool on a
>>> new
>>> kernel, or a new tool on an old kernel, does it work properly?
>>>
>> Tools repo carries a copy of the header and thats how the old tool and
>> new tools have been running with older/newer kernels. There are few
>> more
>> bits left before I can start using directly kernel header for newer
>> tools.
>>
>> Moreover this particular parameter is only used for verbose mode which
>> isn't used in default options.
>
> That doesn't really answer my question, I think.
>
Sorry for not being clear.
> Are you saying that one is required to run old tools on old kernels,
> and new tools on new kernels, and that's how you have this setup in
> your repo?
>
No.
> If so, that really isn't acceptable. Both old and new tools must work
> with all kernel versions.
>
Older version of tools works on either kernel versions. Older tools
don't parse this additional info since its copy of header not
carrying some of these extra verbose fields. Newer/Updated tools which
can parse this extra info in needs newer or an updated kernel which
supports and populates these fields.
As mentioned, this particular option used only in verbose mode so
am ok to drop this change if its still a concern.
Regards,
Santosh
^ permalink raw reply
* Re: [PATCH] [v3] net: phy: phy drivers should not set SUPPORTED_[Asym_]Pause
From: Timur Tabi @ 2016-12-07 17:19 UTC (permalink / raw)
To: Niklas Cassel; +Cc: Florian Fainelli, David Miller, jon.mason, netdev
In-Reply-To: <CAD5ja61A3diBgYwscUSoxJqE1ydmzr+cQ9rR+8uZa0mw-0m_3Q@mail.gmail.com>
On 12/07/2016 03:13 AM, Niklas Cassel wrote:
> You might want to include drivers/net/phy/dp83848.c in your patch.
> Support for pause frames in that phy was recently added to netdev-next.
Thanks. I feel bad that I'm reverting your patch just a few days after
it was applied.
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* Re: [PATCH v3 net-next 1/4] bpf: xdp: Allow head adjustment in XDP prog
From: Daniel Borkmann @ 2016-12-07 17:14 UTC (permalink / raw)
To: David Miller, alexei.starovoitov
Cc: kubakici, kafai, netdev, ast, bblanco, brouer, john.fastabend,
saeedm, tariqt, kernel-team
In-Reply-To: <20161207.120413.939362482173997833.davem@davemloft.net>
On 12/07/2016 06:04 PM, David Miller wrote:
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Date: Wed, 7 Dec 2016 08:37:58 -0800
>
>> On Wed, Dec 07, 2016 at 11:41:12AM +0000, Jakub Kicinski wrote:
>>>> I see nothing wrong if this is exposed/made visible in the usual way through
>>>> ethtool -k as well. I guess at least that would be the expected way to query
>>>> for such driver capabilities.
>>>
>>> +1 on exposing this to user space. Whether via ethtool -k or a
>>> separate XDP-specific netlink message is mostly a question of whether
>>> we expect the need to expose more complex capabilities than bits.
>>
>> I'm very much against using NETIF_F_ flags and exposing this to user space.
>> I see this xdp feature flag as temporary workaround until all drivers
>> support adjust_head() helper. It is very much a fundamental feature for xdp.
>> Without being able to add/remove headers the usability of xdp becomes very limited.
>>
>> If you guys dont like extra ndo_xdp command, I'd suggest to do
>> "if (prog->xdp_adjust_head)" check in the driver and if driver doesn't
>> support it yet, just fail XDP_SETUP_PROG command.
>> imo that will be more flexible interface, since in the future drivers
>> can fail on different combination of features and simple boolean flag
>> unlikely to serve us for long time.
>
> Indeed, if the eventual plan is to have all drivers be required to
> support a fundamental set of XDP features then exporting this in any
> way to userspace is not a good idea.
Agreed, if that is required anyway, then much better and simpler to just
return the -ENOTSUPP from the XDP_SETUP_PROG handling of each driver that
way.
^ permalink raw reply
* RE: [PATCH] net/udp: do not touch skb->peeked unless really needed
From: David Laight @ 2016-12-07 17:09 UTC (permalink / raw)
To: 'Paolo Abeni', Eric Dumazet
Cc: David Miller, netdev, Willem de Bruijn
In-Reply-To: <1481044098.7129.7.camel@redhat.com>
From: Paolo Abeni
> Sent: 06 December 2016 17:08
...
> @@ -79,6 +82,9 @@ struct udp_sock {
> int (*gro_complete)(struct sock *sk,
> struct sk_buff *skb,
> int nhoff);
> +
> + /* since we are prone to drops, avoid dirtying any sk cacheline */
> + atomic_t drops ____cacheline_aligned_in_smp;
> };
Isn't that likely to create a large hole on systems with large cache lines.
(Same as any other use of ____cacheline_aligned_in_smp.)
David
^ permalink raw reply
* Re: pull-request: can 2016-12-07
From: David Miller @ 2016-12-07 17:07 UTC (permalink / raw)
To: mkl; +Cc: netdev, linux-can, kernel
In-Reply-To: <20161207095040.5003-1-mkl@pengutronix.de>
From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed, 7 Dec 2016 10:50:39 +0100
> Andrey Konovalov triggered a warning in the CAN RAW layer, which is
> fixed by a patch by me.
Pulled, thanks Marc.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox