* Re: [PATCH net-next] net: phy: use network device in phy_print_status
From: Joe Perches @ 2014-01-22 18:49 UTC (permalink / raw)
To: Florian Fainelli; +Cc: netdev, davem
In-Reply-To: <1390415553-4961-1-git-send-email-f.fainelli@gmail.com>
On Wed, 2014-01-22 at 10:32 -0800, Florian Fainelli wrote:
> phy_print_status() currently uses dev_name(&phydev->dev) which will
> usually result in printing something along those lines for Device Tree
> aware drivers:
>
> libphy: f0b60000.etherne:0a - Link is Down
> libphy: f0ba0000.etherne:00 - Link is Up - 1000/Full
>
> This is not terribly useful for network administrators or users since we
> expect a network interface name to be able to correlate link events with
> interfaces. Update phy_print_status() to use netdev_info() with
> phydev->attached_dev which is the backing network device for our PHY
> device.
[]
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
[]
> @@ -45,12 +45,11 @@
> void phy_print_status(struct phy_device *phydev)
> {
> if (phydev->link) {
> - pr_info("%s - Link is Up - %d/%s\n",
> - dev_name(&phydev->dev),
> + netdev_info(phydev->attached_dev, "- Link is Up - %d/%s\n",
> phydev->speed,
> DUPLEX_FULL == phydev->duplex ? "Full" : "Half");
> } else {
> - pr_info("%s - Link is Down\n", dev_name(&phydev->dev));
> + netdev_info(phydev->attached_dev, "- Link is Down\n");
> }
The leading "- "'s now aren't useful and these could be
if (phydev->link)
netdev_info(phydev->attached_dev, "Link is Up - %d/%s\n",
phydev->speed,
phydev->duplex == DUPLEX_FULL ? "Full" : "Half");
else
netdev_info(phydev->attached_dev, "Link is Down\n");
Is is possible that phydev->attached_dev is NULL?
It may also be possible to slightly improve the output by
reducing the number of 0's in the speed block and emitting
Kb/Mb/Gb (or Kbps/Mbps/Gbps) and maybe flow control if it's
available.
^ permalink raw reply
* Re: [Xen-devel] [PATCH v4] xen/grant-table: Avoid m2p_override during mapping
From: Zoltan Kiss @ 2014-01-22 18:36 UTC (permalink / raw)
To: Stefano Stabellini
Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel,
jonathan.davies
In-Reply-To: <alpine.DEB.2.02.1401221618340.21510@kaball.uk.xensource.com>
On 22/01/14 16:39, Stefano Stabellini wrote:
> On Tue, 21 Jan 2014, Zoltan Kiss wrote:
>> @@ -121,7 +125,7 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
>> pfn = m2p_find_override_pfn(mfn, ~0);
>> }
>>
>> - /*
>> + /*
>
> Spurious change?
It removes a stray space from the original code. Not necessary, but if
it's there, I think we can keep it.
>> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
>> index 2ae8699..0060178 100644
>> --- a/arch/x86/xen/p2m.c
>> +++ b/arch/x86/xen/p2m.c
>> @@ -872,15 +872,13 @@ static unsigned long mfn_hash(unsigned long mfn)
>>
>> /* Add an MFN override for a particular page */
>> int m2p_add_override(unsigned long mfn, struct page *page,
>> - struct gnttab_map_grant_ref *kmap_op)
>> + struct gnttab_map_grant_ref *kmap_op, unsigned long pfn)
>
> Do we really need to add another additional parameter to
> m2p_add_override?
> I would just let m2p_add_override and m2p_remove_override call
> page_to_pfn again. It is not that expensive.
Yes, because that page_to_pfn can return something different. That's why
the v2 patches failed.
>> @@ -933,20 +924,15 @@ int m2p_add_override(unsigned long mfn, struct page *page,
>> }
>> EXPORT_SYMBOL_GPL(m2p_add_override);
>> int m2p_remove_override(struct page *page,
>> - struct gnttab_map_grant_ref *kmap_op)
>> + struct gnttab_map_grant_ref *kmap_op,
>> + unsigned long pfn,
>> + unsigned long mfn)
>
> Same here
Same as above.
>> @@ -927,8 +931,18 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
>> } else {
>> mfn = PFN_DOWN(map_ops[i].dev_bus_addr);
>> }
>> - ret = m2p_add_override(mfn, pages[i], kmap_ops ?
>> - &kmap_ops[i] : NULL);
>> + pfn = page_to_pfn(pages[i]);
>> +
>> + WARN_ON(PagePrivate(pages[i]));
>> + SetPagePrivate(pages[i]);
>> + set_page_private(pages[i], mfn);
>> +
>> + pages[i]->index = pfn_to_mfn(pfn);
>> + if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))))
>> + return -ENOMEM;
>
> goto out
And ret = -ENOMEM
>
>
>> + if (m2p_override)
>> + ret = m2p_add_override(mfn, pages[i], kmap_ops ?
>> + &kmap_ops[i] : NULL, pfn);
>> if (ret)
>> goto out;
>> }
>> @@ -937,17 +951,34 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
>> if (lazy)
>> arch_leave_lazy_mmu_mode();
>>
>> - return ret;
>> + return 0;
>
> We are loosing the error code possibly returned by m2p_add_override and
> the previous check.
I'll fix that. Also in unmap.
return ret;
>> @@ -958,17 +989,32 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
>> set_phys_to_machine(unmap_ops[i].host_addr >> PAGE_SHIFT,
>> INVALID_P2M_ENTRY);
>> }
>> - return ret;
>> + return 0;
>> }
>>
>> - if (!in_interrupt() && paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
>> + if (m2p_override &&
>> + !in_interrupt() &&
>> + paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
>> arch_enter_lazy_mmu_mode();
>> lazy = true;
>> }
>>
>> for (i = 0; i < count; i++) {
>> - ret = m2p_remove_override(pages[i], kmap_ops ?
>> - &kmap_ops[i] : NULL);
>> + pfn = page_to_pfn(pages[i]);
>> + mfn = get_phys_to_machine(pfn);
>> + if (mfn == INVALID_P2M_ENTRY || !(mfn & FOREIGN_FRAME_BIT))
>> + return -EINVAL;
>
> goto out
And ret = -EINVAL
The above are the the result of the fact that I've based this originally
on 3.10, where the out label haven't existed. I'll send the next version
when the tests pass.
Zoli
^ permalink raw reply
* [net-next PATCH v2 1/1] drivers: net: cpsw: enable promiscuous mode support
From: Mugunthan V N @ 2014-01-22 18:33 UTC (permalink / raw)
To: netdev; +Cc: davem, linux-omap, Mugunthan V N
Enable promiscuous mode support for CPSW.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
---
drivers/net/ethernet/ti/cpsw.c | 105 ++++++++++++++++++++++++++++---------
drivers/net/ethernet/ti/cpsw_ale.c | 16 ++++++
drivers/net/ethernet/ti/cpsw_ale.h | 2 +
3 files changed, 98 insertions(+), 25 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index e8bb77d..bde63e3 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -541,14 +541,93 @@ static inline int cpsw_get_slave_port(struct cpsw_priv *priv, u32 slave_num)
return slave_num;
}
+static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
+{
+ struct cpsw_priv *priv = netdev_priv(ndev);
+ struct cpsw_ale *ale = priv->ale;
+ int i;
+
+ if (priv->data.dual_emac) {
+ bool flag = false;
+
+ /* Enabling promiscuous mode for one interface will be
+ * common for both the interface as the interface shares
+ * the same hardware resource.
+ */
+ for (i = 0; i <= priv->data.slaves; i++)
+ if (priv->slaves[i].ndev->flags & IFF_PROMISC)
+ flag = true;
+
+ if (!enable && flag) {
+ enable = true;
+ dev_err(&ndev->dev, "promiscuity not disabled as the other interface is still in promiscuity mode\n");
+ }
+
+ if (enable) {
+ /* Enable Bypass */
+ cpsw_ale_control_set(ale, 0, ALE_BYPASS, 1);
+
+ dev_dbg(&ndev->dev, "promiscuity enabled\n");
+ } else {
+ /* Disable Bypass */
+ cpsw_ale_control_set(ale, 0, ALE_BYPASS, 0);
+ dev_dbg(&ndev->dev, "promiscuity disabled\n");
+ }
+ } else {
+ if (enable) {
+ unsigned long timeout = jiffies + HZ;
+
+ /* Disable Learn for all ports */
+ for (i = 0; i <= priv->data.slaves; i++) {
+ cpsw_ale_control_set(ale, i,
+ ALE_PORT_NOLEARN, 1);
+ cpsw_ale_control_set(ale, i,
+ ALE_PORT_NO_SA_UPDATE, 1);
+ }
+
+ /* Clear All Untouched entries */
+ cpsw_ale_control_set(ale, 0, ALE_AGEOUT, 1);
+ do {
+ cpu_relax();
+ if (cpsw_ale_control_get(ale, 0, ALE_AGEOUT))
+ break;
+ } while (time_after(timeout, jiffies));
+ cpsw_ale_control_set(ale, 0, ALE_AGEOUT, 1);
+
+ /* Clear all mcast from ALE */
+ cpsw_ale_flush_multicast(ale, ALE_ALL_PORTS <<
+ priv->host_port);
+
+ /* Flood All Unicast Packets to Host port */
+ cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 1);
+ dev_dbg(&ndev->dev, "promiscuity enabled\n");
+ } else {
+ /* Flood All Unicast Packets to Host port */
+ cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 0);
+
+ /* Enable Learn for all ports */
+ for (i = 0; i <= priv->data.slaves; i++) {
+ cpsw_ale_control_set(ale, i,
+ ALE_PORT_NOLEARN, 0);
+ cpsw_ale_control_set(ale, i,
+ ALE_PORT_NO_SA_UPDATE, 0);
+ }
+ dev_dbg(&ndev->dev, "promiscuity disabled\n");
+ }
+ }
+}
+
static void cpsw_ndo_set_rx_mode(struct net_device *ndev)
{
struct cpsw_priv *priv = netdev_priv(ndev);
if (ndev->flags & IFF_PROMISC) {
/* Enable promiscuous mode */
- dev_err(priv->dev, "Ignoring Promiscuous mode\n");
+ cpsw_set_promiscious(ndev, true);
return;
+ } else {
+ /* Disable promiscuous mode */
+ cpsw_set_promiscious(ndev, false);
}
/* Clear all mcast from ALE */
@@ -1257,29 +1336,6 @@ fail:
return NETDEV_TX_BUSY;
}
-static void cpsw_ndo_change_rx_flags(struct net_device *ndev, int flags)
-{
- /*
- * The switch cannot operate in promiscuous mode without substantial
- * headache. For promiscuous mode to work, we would need to put the
- * ALE in bypass mode and route all traffic to the host port.
- * Subsequently, the host will need to operate as a "bridge", learn,
- * and flood as needed. For now, we simply complain here and
- * do nothing about it :-)
- */
- if ((flags & IFF_PROMISC) && (ndev->flags & IFF_PROMISC))
- dev_err(&ndev->dev, "promiscuity ignored!\n");
-
- /*
- * The switch cannot filter multicast traffic unless it is configured
- * in "VLAN Aware" mode. Unfortunately, VLAN awareness requires a
- * whole bunch of additional logic that this driver does not implement
- * at present.
- */
- if ((flags & IFF_ALLMULTI) && !(ndev->flags & IFF_ALLMULTI))
- dev_err(&ndev->dev, "multicast traffic cannot be filtered!\n");
-}
-
#ifdef CONFIG_TI_CPTS
static void cpsw_hwtstamp_v1(struct cpsw_priv *priv)
@@ -1575,7 +1631,6 @@ static const struct net_device_ops cpsw_netdev_ops = {
.ndo_open = cpsw_ndo_open,
.ndo_stop = cpsw_ndo_stop,
.ndo_start_xmit = cpsw_ndo_start_xmit,
- .ndo_change_rx_flags = cpsw_ndo_change_rx_flags,
.ndo_set_mac_address = cpsw_ndo_set_mac_address,
.ndo_do_ioctl = cpsw_ndo_ioctl,
.ndo_validate_addr = eth_validate_addr,
diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c
index 63e9819..7f89306 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -477,6 +477,14 @@ static const struct ale_control_info ale_controls[ALE_NUM_CONTROLS] = {
.port_shift = 0,
.bits = 1,
},
+ [ALE_P0_UNI_FLOOD] = {
+ .name = "port0_unicast_flood",
+ .offset = ALE_CONTROL,
+ .port_offset = 0,
+ .shift = 8,
+ .port_shift = 0,
+ .bits = 1,
+ },
[ALE_VLAN_NOLEARN] = {
.name = "vlan_nolearn",
.offset = ALE_CONTROL,
@@ -573,6 +581,14 @@ static const struct ale_control_info ale_controls[ALE_NUM_CONTROLS] = {
.port_shift = 0,
.bits = 1,
},
+ [ALE_PORT_NO_SA_UPDATE] = {
+ .name = "no_source_update",
+ .offset = ALE_PORTCTL,
+ .port_offset = 4,
+ .shift = 5,
+ .port_shift = 0,
+ .bits = 1,
+ },
[ALE_PORT_MCAST_LIMIT] = {
.name = "mcast_limit",
.offset = ALE_PORTCTL,
diff --git a/drivers/net/ethernet/ti/cpsw_ale.h b/drivers/net/ethernet/ti/cpsw_ale.h
index 30daa12..de409c3 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.h
+++ b/drivers/net/ethernet/ti/cpsw_ale.h
@@ -34,6 +34,7 @@ enum cpsw_ale_control {
ALE_ENABLE,
ALE_CLEAR,
ALE_AGEOUT,
+ ALE_P0_UNI_FLOOD,
ALE_VLAN_NOLEARN,
ALE_NO_PORT_VLAN,
ALE_OUI_DENY,
@@ -47,6 +48,7 @@ enum cpsw_ale_control {
ALE_PORT_DROP_UNTAGGED,
ALE_PORT_DROP_UNKNOWN_VLAN,
ALE_PORT_NOLEARN,
+ ALE_PORT_NO_SA_UPDATE,
ALE_PORT_UNKNOWN_VLAN_MEMBER,
ALE_PORT_UNKNOWN_MCAST_FLOOD,
ALE_PORT_UNKNOWN_REG_MCAST_FLOOD,
--
1.9.rc0
^ permalink raw reply related
* [PATCH net-next] net: phy: use network device in phy_print_status
From: Florian Fainelli @ 2014-01-22 18:32 UTC (permalink / raw)
To: netdev; +Cc: davem, Florian Fainelli
phy_print_status() currently uses dev_name(&phydev->dev) which will
usually result in printing something along those lines for Device Tree
aware drivers:
libphy: f0b60000.etherne:0a - Link is Down
libphy: f0ba0000.etherne:00 - Link is Up - 1000/Full
This is not terribly useful for network administrators or users since we
expect a network interface name to be able to correlate link events with
interfaces. Update phy_print_status() to use netdev_info() with
phydev->attached_dev which is the backing network device for our PHY
device.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/phy/phy.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 19c9eca..8acf57f 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -45,12 +45,11 @@
void phy_print_status(struct phy_device *phydev)
{
if (phydev->link) {
- pr_info("%s - Link is Up - %d/%s\n",
- dev_name(&phydev->dev),
+ netdev_info(phydev->attached_dev, "- Link is Up - %d/%s\n",
phydev->speed,
DUPLEX_FULL == phydev->duplex ? "Full" : "Half");
} else {
- pr_info("%s - Link is Down\n", dev_name(&phydev->dev));
+ netdev_info(phydev->attached_dev, "- Link is Down\n");
}
}
EXPORT_SYMBOL(phy_print_status);
--
1.8.3.2
^ permalink raw reply related
* [PATCH v2] bridge: Fix crash with vlan filtering and tcpdump
From: Vlad Yasevich @ 2014-01-22 18:24 UTC (permalink / raw)
To: netdev; +Cc: Vlad Yasevich, Toshiaki Makita
When the vlan filtering is enabled on the bridge, but
the filter is not configured on the bridge device itself,
running tcpdump on the bridge device will result in a
an Oops with NULL pointer dereference. The reason
is that br_pass_frame_up() will bypass the vlan
check because promisc flag is set. It will then try
to get the table pointer and process the packet based
on the table. Since the table pointer is NULL, we oops.
Catch this special condition in br_handle_vlan().
Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
---
Changes since v1:
- Do not use a BUG or BUG_ON as it is possible to trigger the
false BUG condition when thrashing the vlan_enable toggle.
Instead just drop the skb.
net/bridge/br_input.c | 11 ++++++-----
net/bridge/br_vlan.c | 14 ++++++++++++++
2 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 7e73c32..67fbea0 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -29,6 +29,7 @@ static int br_pass_frame_up(struct sk_buff *skb)
struct net_device *indev, *brdev = BR_INPUT_SKB_CB(skb)->brdev;
struct net_bridge *br = netdev_priv(brdev);
struct br_cpu_netstats *brstats = this_cpu_ptr(br->stats);
+ struct net_port_vlans *pv;
u64_stats_update_begin(&brstats->syncp);
brstats->rx_packets++;
@@ -39,18 +40,18 @@ static int br_pass_frame_up(struct sk_buff *skb)
* packet is allowed except in promisc modue when someone
* may be running packet capture.
*/
+ pv = br_get_vlan_info(br);
if (!(brdev->flags & IFF_PROMISC) &&
- !br_allowed_egress(br, br_get_vlan_info(br), skb)) {
+ !br_allowed_egress(br, pv, skb)) {
kfree_skb(skb);
return NET_RX_DROP;
}
- skb = br_handle_vlan(br, br_get_vlan_info(br), skb);
- if (!skb)
- return NET_RX_DROP;
-
indev = skb->dev;
skb->dev = brdev;
+ skb = br_handle_vlan(br, pv, skb);
+ if (!skb)
+ return NET_RX_DROP;
return NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN, skb, indev, NULL,
netif_receive_skb);
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index af5ebd1..abc841c 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -144,6 +144,20 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
if (!br->vlan_enabled)
goto out;
+ /* Vlan filter table must be configured at this point. The
+ * only exception is the bridge is set in promisc mode and the
+ * packet is destined for the bridge device. In this case
+ * pass the packet as is.
+ */
+ if (!pv) {
+ if ((br->dev->flags & IFF_PROMISC) && skb->dev == br->dev)
+ goto out;
+ else {
+ kfree_skb_list(skb);
+ return NULL;
+ }
+ }
+
/* At this point, we know that the frame was filtered and contains
* a valid vlan id. If the vlan id is set in the untagged bitmap,
* send untagged; otherwise, send taged.
--
1.8.4.2
^ permalink raw reply related
* Re: [PATCH] gss_krb5: use lcm from kernel lib
From: J. Bruce Fields @ 2014-01-22 18:12 UTC (permalink / raw)
To: Luis Henriques
Cc: Trond Myklebust, David S. Miller,
linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20140119215040.GA7978@hercules>
On Sun, Jan 19, 2014 at 09:50:51PM +0000, Luis Henriques wrote:
> Replace hardcoded lowest common multiple algorithm by the lcm()
> function in kernel lib.
Looks OK to me. Applying for 3.14 if Trond hasn't already picked it up.
--b.
>
> Signed-off-by: Luis Henriques <luis.henriques-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> ---
> net/sunrpc/auth_gss/gss_krb5_keys.c | 17 ++++-------------
> 1 file changed, 4 insertions(+), 13 deletions(-)
>
> diff --git a/net/sunrpc/auth_gss/gss_krb5_keys.c b/net/sunrpc/auth_gss/gss_krb5_keys.c
> index 76e42e6..a16c5b6 100644
> --- a/net/sunrpc/auth_gss/gss_krb5_keys.c
> +++ b/net/sunrpc/auth_gss/gss_krb5_keys.c
> @@ -59,6 +59,7 @@
> #include <linux/crypto.h>
> #include <linux/sunrpc/gss_krb5.h>
> #include <linux/sunrpc/xdr.h>
> +#include <linux/lcm.h>
>
> #ifdef RPC_DEBUG
> # define RPCDBG_FACILITY RPCDBG_AUTH
> @@ -72,7 +73,7 @@
> static void krb5_nfold(u32 inbits, const u8 *in,
> u32 outbits, u8 *out)
> {
> - int a, b, c, lcm;
> + unsigned long ulcm;
> int byte, i, msbit;
>
> /* the code below is more readable if I make these bytes
> @@ -82,17 +83,7 @@ static void krb5_nfold(u32 inbits, const u8 *in,
> outbits >>= 3;
>
> /* first compute lcm(n,k) */
> -
> - a = outbits;
> - b = inbits;
> -
> - while (b != 0) {
> - c = b;
> - b = a%b;
> - a = c;
> - }
> -
> - lcm = outbits*inbits/a;
> + ulcm = lcm(inbits, outbits);
>
> /* now do the real work */
>
> @@ -101,7 +92,7 @@ static void krb5_nfold(u32 inbits, const u8 *in,
>
> /* this will end up cycling through k lcm(k,n)/k times, which
> is correct */
> - for (i = lcm-1; i >= 0; i--) {
> + for (i = ulcm-1; i >= 0; i--) {
> /* compute the msbit in k which gets added into this byte */
> msbit = (
> /* first, start with the msbit in the first,
> --
> 1.8.3.2
>
> Cheers,
> --
> Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] net/openvswitch: Remove the skb encapsulation mark after decapsulation
From: Jesse Gross @ 2014-01-22 18:02 UTC (permalink / raw)
To: Or Gerlitz
Cc: Or Gerlitz, David Miller, netdev, Joseph Gasparakis,
Pravin B Shelar
In-Reply-To: <CAJZOPZ+oRD=76JUELo51yZTTvjnc=4HYy0EPWtR1HH1CY870cA@mail.gmail.com>
On Wed, Jan 22, 2014 at 9:23 AM, Or Gerlitz <or.gerlitz@gmail.com> wrote:
> On Wed, Jan 22, 2014 at 7:02 PM, Jesse Gross <jesse@nicira.com> wrote:
>> On Wed, Jan 22, 2014 at 7:15 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>>> We must unset the skb encapsulation mark before injecting the
>>> decapsulated packet into ovs for the rest of its journey.
>>>
>>> This follows the practice set by 0afb166 "vxlan: Add capability of Rx
>>> checksum offload for inner packet" and the overall idea behind the
>>> skb encapsulation field.
>>>
>>> Cc: Joseph Gasparakis <joseph.gasparakis@intel.com>
>>> Cc: Pravin B Shelar <pshelar@nicira.com>
>>> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>>
>> This should be in the common decapsulation code. It doesn't make sense
>> to do this here when we set the layer pointers, encap bit, etc. in
>> common code on transmit.
>
> well that's a bit problematic, since the code in the vxlan driver vxlan_rcv()
> which has the potential to be common refers to vxlan->dev-> which is
> irrelevant for ovs, thoughts?
Well, as I said before, I don't really see the value in the
NETIF_F_RXCSUM flag on a VXLAN device but in any case you could break
that if statement in half and move the part that doesn't refer to
vxlan->dev into common code.
^ permalink raw reply
* Re: Fwd: [RFC PATCH net-next 0/3] virtio_net: add aRFS support
From: Tom Herbert @ 2014-01-22 18:00 UTC (permalink / raw)
To: Zhi Yong Wu
Cc: Ben Hutchings, Stefan Hajnoczi, Linux Netdev List, Eric Dumazet,
David S. Miller, Zhi Yong Wu, Michael S. Tsirkin, Rusty Russell,
Jason Wang
In-Reply-To: <CAEH94LiqVpXUp3myHvTPVHaqrDpK2V6ii+w8Gip61PA+QF2apw@mail.gmail.com>
>> 1. The aRFS interface for the guest to specify which virtual queue to
>> receive a packet on is fairly straight forward.
>> 2. To hook into RFS, we need to match the virtual queue to the real
>> CPU it will processed on, and then program the RFS table for that flow
>> and CPU.
>> 3. NIC aRFS keys off the RFS tables so it can program the HW with the
>> correct queue for the CPU.
> Does anyone have time to make one conclusion for this discussion? in
> particular, how will rx packet be steered up the stack from guest
> virtio_net driver, virtio_net NIC, vhost_net, tun driver, host network
> stack, to physical NIC with more details?
> What is the role of each path units? otherwise this discussion wont
> get any meanful result, which is not what we expect.
>
Working code outweighs theoretical discussion :-). I think you started
on a good track with original patches, and I believe the tun path
should work pretty well (some performance numbers would still be good
to validate). Seems like there's enough hooks in the virtio_net path
to implement something meaningful and maybe get some numbers (maybe
ignore NIC aRFS in the first pass).
Tom
>>
>>> Stefan
>
>
>
> --
> Regards,
>
> Zhi Yong Wu
^ permalink raw reply
* Re: [net-next PATCH 1/1] drivers: net: cpsw: enable promiscuous mode support
From: Mugunthan V N @ 2014-01-22 17:57 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-omap
In-Reply-To: <20140121.181137.840971123475894969.davem@davemloft.net>
On Wednesday 22 January 2014 07:41 AM, David Miller wrote:
> From: Mugunthan V N <mugunthanvnm@ti.com>
> Date: Mon, 20 Jan 2014 17:38:38 +0530
>
>> + if (!enable && ((priv->slaves[0].ndev->flags & IFF_PROMISC) ||
>> + (priv->slaves[1].ndev->flags & IFF_PROMISC))) {
> This assumption that there are exactly 2 slaves is not valid.
>
> Use the appropriate for_each_slave() et al. abstractions to access
> the slaves.
for_each_slave cannot be used here as in dual EMAC, callback will be
called only once for that particular slave. So I will use a for loop
with the slave count so that it can be utilized for any no of slaves.
Regards
Mugunthan V N
^ permalink raw reply
* [PATCH v2] net: Correctly sync addresses from multiple sources to single device
From: Vlad Yasevich @ 2014-01-22 17:54 UTC (permalink / raw)
To: netdev
Cc: Vlad Yasevich, Andrey Dmitrov, Alexandra N. Kossovsky,
Konstantin Ushakov
In-Reply-To: <52DFD32D.4060805@oktetlabs.ru>
When we have multiple devices attempting to sync the same address
to a single destination, each device should be permitted to sync
it once. To accomplish this, pass the 'sync_cnt' of the source
address when adding the addresss to the lower device. 'sync_cnt'
tracks how many time a given address has been succefully synced.
This way, we know that if the 'sync_cnt' passed in is 0, we should
sync this address.
Also, turn 'synced' member back into the counter as was originally
done in
commit 4543fbefe6e06a9e40d9f2b28d688393a299f079.
net: count hw_addr syncs so that unsync works properly.
It tracks how many time a given address has been added via a
'sync' operation. For every successfull 'sync' the counter is
incremented, and for ever 'unsync', the counter is decremented.
This makes sure that the address will be properly removed from
the the lower device when all the upper devices have removed it.
Reported-by: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
CC: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
CC: Alexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru>
CC: Konstantin Ushakov <Konstantin.Ushakov@oktetlabs.ru>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
---
net/core/dev_addr_lists.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index ec40a84..9974f48 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -38,7 +38,7 @@ static int __hw_addr_create_ex(struct netdev_hw_addr_list *list,
ha->type = addr_type;
ha->refcount = 1;
ha->global_use = global;
- ha->synced = sync;
+ ha->synced = sync ? 1 : 0;
ha->sync_cnt = 0;
list_add_tail_rcu(&ha->list, &list->list);
list->count++;
@@ -48,7 +48,8 @@ static int __hw_addr_create_ex(struct netdev_hw_addr_list *list,
static int __hw_addr_add_ex(struct netdev_hw_addr_list *list,
const unsigned char *addr, int addr_len,
- unsigned char addr_type, bool global, bool sync)
+ unsigned char addr_type, bool global, bool sync,
+ int sync_count)
{
struct netdev_hw_addr *ha;
@@ -66,10 +67,10 @@ static int __hw_addr_add_ex(struct netdev_hw_addr_list *list,
ha->global_use = true;
}
if (sync) {
- if (ha->synced)
+ if (ha->synced && sync_count)
return -EEXIST;
else
- ha->synced = true;
+ ha->synced++;
}
ha->refcount++;
return 0;
@@ -84,7 +85,8 @@ static int __hw_addr_add(struct netdev_hw_addr_list *list,
const unsigned char *addr, int addr_len,
unsigned char addr_type)
{
- return __hw_addr_add_ex(list, addr, addr_len, addr_type, false, false);
+ return __hw_addr_add_ex(list, addr, addr_len, addr_type, false, false,
+ 0);
}
static int __hw_addr_del_entry(struct netdev_hw_addr_list *list,
@@ -101,7 +103,7 @@ static int __hw_addr_del_entry(struct netdev_hw_addr_list *list,
ha->global_use = false;
if (sync)
- ha->synced = false;
+ ha->synced--;
if (--ha->refcount)
return 0;
@@ -139,7 +141,7 @@ static int __hw_addr_sync_one(struct netdev_hw_addr_list *to_list,
int err;
err = __hw_addr_add_ex(to_list, ha->addr, addr_len, ha->type,
- false, true);
+ false, true, ha->sync_cnt);
if (err && err != -EEXIST)
return err;
@@ -676,7 +678,7 @@ static int __dev_mc_add(struct net_device *dev, const unsigned char *addr,
netif_addr_lock_bh(dev);
err = __hw_addr_add_ex(&dev->mc, addr, dev->addr_len,
- NETDEV_HW_ADDR_T_MULTICAST, global, false);
+ NETDEV_HW_ADDR_T_MULTICAST, global, false, 0);
if (!err)
__dev_set_rx_mode(dev);
netif_addr_unlock_bh(dev);
--
1.8.4.2
^ permalink raw reply related
* oops in tcp_xmit_retransmit_queue
From: Venkat Venkatsubra @ 2014-01-22 17:32 UTC (permalink / raw)
To: netdev; +Cc: davem
We hit this crash in tcp_xmit_retransmit_queue.
@ BUG: unable to handle kernel NULL pointer dereference at (null)
@ IP: [<ffffffff813fe792>] tcp_xmit_retransmit_queue+0x21e/0x25d
@ .
@ Call Trace:
@ <IRQ>
@ [<ffffffff813f9b25>] tcp_ack+0x1662/0x168d
@ [<ffffffff813fbba5>] ? tcp_init_tso_segs+0x3a/0x51
@ [<ffffffff813f9fbf>] ? tcp_validate_incoming+0x69/0x296
@ [<ffffffff813fb1da>] tcp_rcv_established+0x4db/0x566
@ [<ffffffff81401244>] tcp_v4_do_rcv+0x196/0x352
@ [<ffffffff8105e007>] ? local_bh_enable+0x12/0x14
@ [<ffffffff81402696>] tcp_v4_rcv+0x459/0x6d0
@ [<ffffffff81043309>] ? test_tsk_thread_flag+0x12/0x14
@ [<ffffffff8104536b>] ? select_idle_sibling+0x3a/0xe7
@ [<ffffffff813e6b66>] ip_local_deliver_finish+0x152/0x1fa
@ [<ffffffff813e6f61>] ip_local_deliver+0x72/0x7d
@ [<ffffffff813e6992>] ip_rcv_finish+0x372/0x38c
@ [<ffffffff813f341f>] ? tcp_gro_receive+0x7e/0x1e5
@ [<ffffffff813e6eb0>] ip_rcv+0x2a2/0x2e1
@ [<ffffffff813c11ab>] __netif_receive_skb+0x41b/0x440
@ [<ffffffff813c1219>] netif_receive_skb+0x49/0x50
@ [<ffffffff813c12b5>] napi_skb_finish+0x2b/0x42
@ [<ffffffff813c172e>] napi_gro_receive+0x2f/0x34
@ [<ffffffffa017b5e8>] igb_poll+0x808/0xb78 [igb]
@ [<ffffffff8104505f>] ? __enqueue_entity+0x79/0x7b
@ [<ffffffff813c42d9>] net_rx_action+0xc6/0x1cd
@ [<ffffffff8105e8c1>] __do_softirq+0xd7/0x19e
@ [<ffffffff810aee90>] ? handle_IRQ_event+0x10a/0x120
@ [<ffffffff81012eec>] call_softirq+0x1c/0x30
@ [<ffffffff81014695>] do_softirq+0x46/0x89
@ [<ffffffff8105e746>] irq_exit+0x3b/0x7a
@ [<ffffffff8145b5d1>] do_IRQ+0x99/0xb0
@ [<ffffffff81012713>] ret_from_intr+0x0/0x11
@ <EOI>
@ [<ffffffff810199d2>] ? mwait_idle+0x74/0x7f
@ [<ffffffff810199c5>] ? mwait_idle+0x67/0x7f
@ [<ffffffff81010d6f>] ? cpu_idle+0xa5/0xd4
@ [<ffffffff81450f2f>] ? start_secondary+0x1fd/0x23c
@ .
@ RIP [<ffffffff813fe792>] tcp_xmit_retransmit_queue+0x21e/0x25d
tp->retransmit_skb_hint is non-NULL. retransmit_skb_hint->next is NULL.
It crashes while walking through this list:
tcp_for_write_queue_from(skb, sk) {
__u8 sacked = TCP_SKB_CB(skb)->sacked;
retransmit_skb_hint is pointing to a seq# range that is quite before tp->snd_una.
Both "seq" and "end_seq" of tcp_skb_cb of retransmit_skb_hint are
before tp->snd_una. Looks like tp->retransmit_skb_hint is either not unset
in some path or gets set when it should not be.
Some more info of the Customer's environment:
-sack is enabled
-this occurred on 2.6.32-400.1.1.el5uek which is based on linux-2.6.32
-is not re-creatable
Is this a known problem that has been fixed after 2.6.32 ?
Thanks.
Venkat
^ permalink raw reply
* Re: [net-next PATCH 1/1] drivers: net: cpsw: enable promiscuous mode support
From: Mugunthan V N @ 2014-01-22 17:23 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-omap
In-Reply-To: <20140121.181137.840971123475894969.davem@davemloft.net>
On Wednesday 22 January 2014 07:41 AM, David Miller wrote:
> From: Mugunthan V N <mugunthanvnm@ti.com>
> Date: Mon, 20 Jan 2014 17:38:38 +0530
>
>> + if (!enable && ((priv->slaves[0].ndev->flags & IFF_PROMISC) ||
>> + (priv->slaves[1].ndev->flags & IFF_PROMISC))) {
> This assumption that there are exactly 2 slaves is not valid.
>
> Use the appropriate for_each_slave() et al. abstractions to access
> the slaves.
Will fix this in next version.
Regards
Mugunthan V N
^ permalink raw reply
* Re: [PATCH] net/openvswitch: Remove the skb encapsulation mark after decapsulation
From: Or Gerlitz @ 2014-01-22 17:23 UTC (permalink / raw)
To: Jesse Gross
Cc: Or Gerlitz, David Miller, netdev, Joseph Gasparakis,
Pravin B Shelar
In-Reply-To: <CAEP_g=9_8aPyve-obKCLo9ByFPvSNEtNBpaUKCw7X8Tcc3U1yA@mail.gmail.com>
On Wed, Jan 22, 2014 at 7:02 PM, Jesse Gross <jesse@nicira.com> wrote:
> On Wed, Jan 22, 2014 at 7:15 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>> We must unset the skb encapsulation mark before injecting the
>> decapsulated packet into ovs for the rest of its journey.
>>
>> This follows the practice set by 0afb166 "vxlan: Add capability of Rx
>> checksum offload for inner packet" and the overall idea behind the
>> skb encapsulation field.
>>
>> Cc: Joseph Gasparakis <joseph.gasparakis@intel.com>
>> Cc: Pravin B Shelar <pshelar@nicira.com>
>> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>
> This should be in the common decapsulation code. It doesn't make sense
> to do this here when we set the layer pointers, encap bit, etc. in
> common code on transmit.
well that's a bit problematic, since the code in the vxlan driver vxlan_rcv()
which has the potential to be common refers to vxlan->dev-> which is
irrelevant for ovs, thoughts?
^ permalink raw reply
* Re: [PATCH] net/openvswitch: Remove the skb encapsulation mark after decapsulation
From: Jesse Gross @ 2014-01-22 17:02 UTC (permalink / raw)
To: Or Gerlitz; +Cc: David Miller, netdev, Joseph Gasparakis, Pravin B Shelar
In-Reply-To: <1390403705-28690-1-git-send-email-ogerlitz@mellanox.com>
On Wed, Jan 22, 2014 at 7:15 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> We must unset the skb encapsulation mark before injecting the
> decapsulated packet into ovs for the rest of its journey.
>
> This follows the practice set by 0afb166 "vxlan: Add capability of Rx
> checksum offload for inner packet" and the overall idea behind the
> skb encapsulation field.
>
> Cc: Joseph Gasparakis <joseph.gasparakis@intel.com>
> Cc: Pravin B Shelar <pshelar@nicira.com>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
This should be in the common decapsulation code. It doesn't make sense
to do this here when we set the layer pointers, encap bit, etc. in
common code on transmit.
^ permalink raw reply
* Re: [Xen-devel] [PATCH v4] xen/grant-table: Avoid m2p_override during mapping
From: Stefano Stabellini @ 2014-01-22 16:39 UTC (permalink / raw)
To: Zoltan Kiss
Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel,
jonathan.davies
In-Reply-To: <1390335755-29328-1-git-send-email-zoltan.kiss@citrix.com>
On Tue, 21 Jan 2014, Zoltan Kiss wrote:
> The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
> for blkback and future netback patches it just cause a lock contention, as
> those pages never go to userspace. Therefore this series does the following:
> - the original functions were renamed to __gnttab_[un]map_refs, with a new
> parameter m2p_override
> - based on m2p_override either they follow the original behaviour, or just set
> the private flag and call set_phys_to_machine
> - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
> m2p_override false
> - a new function gnttab_[un]map_refs_userspace provides the old behaviour
>
> v2:
> - move the storing of the old mfn in page->index to gnttab_map_refs
> - move the function header update to a separate patch
>
> v3:
> - a new approach to retain old behaviour where it needed
> - squash the patches into one
>
> v4:
> - move out the common bits from m2p* functions, and pass pfn/mfn as parameter
> - clear page->private before doing anything with the page, so m2p_find_override
> won't race with this
>
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> Suggested-by: David Vrabel <david.vrabel@citrix.com>
> ---
> arch/x86/include/asm/xen/page.h | 12 +++--
> arch/x86/xen/p2m.c | 25 ++--------
> drivers/block/xen-blkback/blkback.c | 15 +++---
> drivers/xen/gntdev.c | 13 +++--
> drivers/xen/grant-table.c | 90 +++++++++++++++++++++++++++++------
> include/xen/grant_table.h | 8 +++-
> 6 files changed, 107 insertions(+), 56 deletions(-)
>
> diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> index b913915..68a1438 100644
> --- a/arch/x86/include/asm/xen/page.h
> +++ b/arch/x86/include/asm/xen/page.h
> @@ -49,10 +49,14 @@ extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
> extern unsigned long set_phys_range_identity(unsigned long pfn_s,
> unsigned long pfn_e);
>
> -extern int m2p_add_override(unsigned long mfn, struct page *page,
> - struct gnttab_map_grant_ref *kmap_op);
> +extern int m2p_add_override(unsigned long mfn,
> + struct page *page,
> + struct gnttab_map_grant_ref *kmap_op,
> + unsigned long pfn);
> extern int m2p_remove_override(struct page *page,
> - struct gnttab_map_grant_ref *kmap_op);
> + struct gnttab_map_grant_ref *kmap_op,
> + unsigned long pfn,
> + unsigned long mfn);
> extern struct page *m2p_find_override(unsigned long mfn);
> extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn);
>
> @@ -121,7 +125,7 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
> pfn = m2p_find_override_pfn(mfn, ~0);
> }
>
> - /*
> + /*
Spurious change?
> * pfn is ~0 if there are no entries in the m2p for mfn or if the
> * entry doesn't map back to the mfn and m2p_override doesn't have a
> * valid entry for it.
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 2ae8699..0060178 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -872,15 +872,13 @@ static unsigned long mfn_hash(unsigned long mfn)
>
> /* Add an MFN override for a particular page */
> int m2p_add_override(unsigned long mfn, struct page *page,
> - struct gnttab_map_grant_ref *kmap_op)
> + struct gnttab_map_grant_ref *kmap_op, unsigned long pfn)
Do we really need to add another additional parameter to
m2p_add_override?
I would just let m2p_add_override and m2p_remove_override call
page_to_pfn again. It is not that expensive.
> {
> unsigned long flags;
> - unsigned long pfn;
> unsigned long uninitialized_var(address);
> unsigned level;
> pte_t *ptep = NULL;
>
> - pfn = page_to_pfn(page);
> if (!PageHighMem(page)) {
> address = (unsigned long)__va(pfn << PAGE_SHIFT);
> ptep = lookup_address(address, &level);
> @@ -888,13 +886,6 @@ int m2p_add_override(unsigned long mfn, struct page *page,
> "m2p_add_override: pfn %lx not mapped", pfn))
> return -EINVAL;
> }
> - WARN_ON(PagePrivate(page));
> - SetPagePrivate(page);
> - set_page_private(page, mfn);
> - page->index = pfn_to_mfn(pfn);
> -
> - if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))))
> - return -ENOMEM;
>
> if (kmap_op != NULL) {
> if (!PageHighMem(page)) {
> @@ -933,20 +924,15 @@ int m2p_add_override(unsigned long mfn, struct page *page,
> }
> EXPORT_SYMBOL_GPL(m2p_add_override);
> int m2p_remove_override(struct page *page,
> - struct gnttab_map_grant_ref *kmap_op)
> + struct gnttab_map_grant_ref *kmap_op,
> + unsigned long pfn,
> + unsigned long mfn)
Same here
> {
> unsigned long flags;
> - unsigned long mfn;
> - unsigned long pfn;
> unsigned long uninitialized_var(address);
> unsigned level;
> pte_t *ptep = NULL;
>
> - pfn = page_to_pfn(page);
> - mfn = get_phys_to_machine(pfn);
> - if (mfn == INVALID_P2M_ENTRY || !(mfn & FOREIGN_FRAME_BIT))
> - return -EINVAL;
> -
> if (!PageHighMem(page)) {
> address = (unsigned long)__va(pfn << PAGE_SHIFT);
> ptep = lookup_address(address, &level);
> @@ -959,10 +945,7 @@ int m2p_remove_override(struct page *page,
> spin_lock_irqsave(&m2p_override_lock, flags);
> list_del(&page->lru);
> spin_unlock_irqrestore(&m2p_override_lock, flags);
> - WARN_ON(!PagePrivate(page));
> - ClearPagePrivate(page);
>
> - set_phys_to_machine(pfn, page->index);
> if (kmap_op != NULL) {
> if (!PageHighMem(page)) {
> struct multicall_space mcs;
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 6620b73..875025f 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -285,8 +285,7 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
>
> if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST ||
> !rb_next(&persistent_gnt->node)) {
> - ret = gnttab_unmap_refs(unmap, NULL, pages,
> - segs_to_unmap);
> + ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap);
> BUG_ON(ret);
> put_free_pages(blkif, pages, segs_to_unmap);
> segs_to_unmap = 0;
> @@ -321,8 +320,7 @@ static void unmap_purged_grants(struct work_struct *work)
> pages[segs_to_unmap] = persistent_gnt->page;
>
> if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
> - ret = gnttab_unmap_refs(unmap, NULL, pages,
> - segs_to_unmap);
> + ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap);
> BUG_ON(ret);
> put_free_pages(blkif, pages, segs_to_unmap);
> segs_to_unmap = 0;
> @@ -330,7 +328,7 @@ static void unmap_purged_grants(struct work_struct *work)
> kfree(persistent_gnt);
> }
> if (segs_to_unmap > 0) {
> - ret = gnttab_unmap_refs(unmap, NULL, pages, segs_to_unmap);
> + ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap);
> BUG_ON(ret);
> put_free_pages(blkif, pages, segs_to_unmap);
> }
> @@ -670,15 +668,14 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif,
> GNTMAP_host_map, pages[i]->handle);
> pages[i]->handle = BLKBACK_INVALID_HANDLE;
> if (++invcount == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
> - ret = gnttab_unmap_refs(unmap, NULL, unmap_pages,
> - invcount);
> + ret = gnttab_unmap_refs(unmap, unmap_pages, invcount);
> BUG_ON(ret);
> put_free_pages(blkif, unmap_pages, invcount);
> invcount = 0;
> }
> }
> if (invcount) {
> - ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
> + ret = gnttab_unmap_refs(unmap, unmap_pages, invcount);
> BUG_ON(ret);
> put_free_pages(blkif, unmap_pages, invcount);
> }
> @@ -740,7 +737,7 @@ again:
> }
>
> if (segs_to_map) {
> - ret = gnttab_map_refs(map, NULL, pages_to_gnt, segs_to_map);
> + ret = gnttab_map_refs(map, pages_to_gnt, segs_to_map);
> BUG_ON(ret);
> }
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index e41c79c..e652c0e 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -284,8 +284,10 @@ static int map_grant_pages(struct grant_map *map)
> }
>
> pr_debug("map %d+%d\n", map->index, map->count);
> - err = gnttab_map_refs(map->map_ops, use_ptemod ? map->kmap_ops : NULL,
> - map->pages, map->count);
> + err = gnttab_map_refs_userspace(map->map_ops,
> + use_ptemod ? map->kmap_ops : NULL,
> + map->pages,
> + map->count);
> if (err)
> return err;
>
> @@ -315,9 +317,10 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages)
> }
> }
>
> - err = gnttab_unmap_refs(map->unmap_ops + offset,
> - use_ptemod ? map->kmap_ops + offset : NULL, map->pages + offset,
> - pages);
> + err = gnttab_unmap_refs_userspace(map->unmap_ops + offset,
> + use_ptemod ? map->kmap_ops + offset : NULL,
> + map->pages + offset,
> + pages);
> if (err)
> return err;
>
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index aa846a4..1f97fa0 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -880,15 +880,17 @@ void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count)
> }
> EXPORT_SYMBOL_GPL(gnttab_batch_copy);
>
> -int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> +int __gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> struct gnttab_map_grant_ref *kmap_ops,
> - struct page **pages, unsigned int count)
> + struct page **pages, unsigned int count,
> + bool m2p_override)
> {
> int i, ret;
> bool lazy = false;
> pte_t *pte;
> - unsigned long mfn;
> + unsigned long mfn, pfn;
>
> + BUG_ON(kmap_ops && !m2p_override);
> ret = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, map_ops, count);
> if (ret)
> return ret;
> @@ -907,10 +909,12 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> set_phys_to_machine(map_ops[i].host_addr >> PAGE_SHIFT,
> map_ops[i].dev_bus_addr >> PAGE_SHIFT);
> }
> - return ret;
> + return 0;
> }
>
> - if (!in_interrupt() && paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
> + if (m2p_override &&
> + !in_interrupt() &&
> + paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
> arch_enter_lazy_mmu_mode();
> lazy = true;
> }
> @@ -927,8 +931,18 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> } else {
> mfn = PFN_DOWN(map_ops[i].dev_bus_addr);
> }
> - ret = m2p_add_override(mfn, pages[i], kmap_ops ?
> - &kmap_ops[i] : NULL);
> + pfn = page_to_pfn(pages[i]);
> +
> + WARN_ON(PagePrivate(pages[i]));
> + SetPagePrivate(pages[i]);
> + set_page_private(pages[i], mfn);
> +
> + pages[i]->index = pfn_to_mfn(pfn);
> + if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))))
> + return -ENOMEM;
goto out
> + if (m2p_override)
> + ret = m2p_add_override(mfn, pages[i], kmap_ops ?
> + &kmap_ops[i] : NULL, pfn);
> if (ret)
> goto out;
> }
> @@ -937,17 +951,34 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> if (lazy)
> arch_leave_lazy_mmu_mode();
>
> - return ret;
> + return 0;
We are loosing the error code possibly returned by m2p_add_override and
the previous check.
> +}
> +
> +int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> + struct page **pages, unsigned int count)
> +{
> + return __gnttab_map_refs(map_ops, NULL, pages, count, false);
> }
> EXPORT_SYMBOL_GPL(gnttab_map_refs);
>
> -int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
> +int gnttab_map_refs_userspace(struct gnttab_map_grant_ref *map_ops,
> + struct gnttab_map_grant_ref *kmap_ops,
> + struct page **pages, unsigned int count)
> +{
> + return __gnttab_map_refs(map_ops, kmap_ops, pages, count, true);
> +}
> +EXPORT_SYMBOL_GPL(gnttab_map_refs_userspace);
> +
> +int __gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
> struct gnttab_map_grant_ref *kmap_ops,
> - struct page **pages, unsigned int count)
> + struct page **pages, unsigned int count,
> + bool m2p_override)
> {
> int i, ret;
> bool lazy = false;
> + unsigned long pfn, mfn;
>
> + BUG_ON(kmap_ops && !m2p_override);
> ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap_ops, count);
> if (ret)
> return ret;
> @@ -958,17 +989,32 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
> set_phys_to_machine(unmap_ops[i].host_addr >> PAGE_SHIFT,
> INVALID_P2M_ENTRY);
> }
> - return ret;
> + return 0;
> }
>
> - if (!in_interrupt() && paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
> + if (m2p_override &&
> + !in_interrupt() &&
> + paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
> arch_enter_lazy_mmu_mode();
> lazy = true;
> }
>
> for (i = 0; i < count; i++) {
> - ret = m2p_remove_override(pages[i], kmap_ops ?
> - &kmap_ops[i] : NULL);
> + pfn = page_to_pfn(pages[i]);
> + mfn = get_phys_to_machine(pfn);
> + if (mfn == INVALID_P2M_ENTRY || !(mfn & FOREIGN_FRAME_BIT))
> + return -EINVAL;
goto out
> + set_page_private(pages[i], INVALID_P2M_ENTRY);
> + WARN_ON(!PagePrivate(pages[i]));
> + ClearPagePrivate(pages[i]);
> + set_phys_to_machine(pfn, pages[i]->index);
> + if (m2p_override)
> + ret = m2p_remove_override(pages[i],
> + kmap_ops ?
> + &kmap_ops[i] : NULL,
> + pfn,
> + mfn);
> if (ret)
> goto out;
> }
> @@ -977,10 +1023,24 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
> if (lazy)
> arch_leave_lazy_mmu_mode();
>
> - return ret;
> + return 0;
We are loosing the error code possibly returned by m2p_remove_override
and the two previous checks.
> +}
> +
> +int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *map_ops,
> + struct page **pages, unsigned int count)
> +{
> + return __gnttab_unmap_refs(map_ops, NULL, pages, count, false);
> }
> EXPORT_SYMBOL_GPL(gnttab_unmap_refs);
>
> +int gnttab_unmap_refs_userspace(struct gnttab_unmap_grant_ref *map_ops,
> + struct gnttab_map_grant_ref *kmap_ops,
> + struct page **pages, unsigned int count)
> +{
> + return __gnttab_unmap_refs(map_ops, kmap_ops, pages, count, true);
> +}
> +EXPORT_SYMBOL_GPL(gnttab_unmap_refs_userspace);
> +
> static unsigned nr_status_frames(unsigned nr_grant_frames)
> {
> BUG_ON(grefs_per_grant_frame == 0);
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index 694dcaf..9a919b1 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -184,11 +184,15 @@ unsigned int gnttab_max_grant_frames(void);
> #define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))
>
> int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> - struct gnttab_map_grant_ref *kmap_ops,
> struct page **pages, unsigned int count);
> +int gnttab_map_refs_userspace(struct gnttab_map_grant_ref *map_ops,
> + struct gnttab_map_grant_ref *kmap_ops,
> + struct page **pages, unsigned int count);
> int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
> - struct gnttab_map_grant_ref *kunmap_ops,
> struct page **pages, unsigned int count);
> +int gnttab_unmap_refs_userspace(struct gnttab_unmap_grant_ref *unmap_ops,
> + struct gnttab_map_grant_ref *kunmap_ops,
> + struct page **pages, unsigned int count);
>
> /* Perform a batch of grant map/copy operations. Retry every batch slot
> * for which the hypervisor returns GNTST_eagain. This is typically due
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
^ permalink raw reply
* Re: [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up
From: Eric Dumazet @ 2014-01-22 16:20 UTC (permalink / raw)
To: Sabrina Dubroca; +Cc: netdev, hannes, gaofeng
In-Reply-To: <1390404908-3914-1-git-send-email-sd@queasysnail.net>
On Wed, 2014-01-22 at 16:35 +0100, Sabrina Dubroca wrote:
>
> ---
> net/ipv6/addrconf.c | 22 ++++++++++++++++++----
> 1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 4b6b720..b2eb653 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2578,6 +2578,23 @@ static void sit_add_v4_addrs(struct inet6_dev *idev)
> }
> #endif
>
> +struct rt6_info *addrconf_dst_alloc_once(struct inet6_dev *idev,
> + const struct in6_addr *addr,
> + bool anycast)
> +{
> + struct net *net = dev_net(idev->dev);
> + struct rt6_info *rt = rt6_lookup(net, addr, NULL, 0, 0);
This allocates a dst
> +
> + if (rt == NULL)
> + return addrconf_dst_alloc(idev, addr, anycast);
> + else if (!(rt->rt6i_flags & RTF_NONEXTHOP &&
> + ((anycast && rt->rt6i_flags & RTF_ANYCAST) ||
> + (!anycast && rt->rt6i_flags & RTF_LOCAL))))
> + return addrconf_dst_alloc(idev, addr, anycast);
But nothing seems to release it
It seems your patch lacks ip6_rt_put(rt);
> +
> + return ERR_PTR(-EEXIST);
> +}
> +
^ permalink raw reply
* Re: [patch net-next 2/5] rtnetlink: put "BOND" into nl attribute names which are related to bonding
From: Nicolas Dichtel @ 2014-01-22 15:57 UTC (permalink / raw)
To: Jiri Pirko, netdev
Cc: davem, fubar, vfalico, andy, sfeldma, stephen, vyasevic,
john.r.fastabend
In-Reply-To: <1390377957-31466-3-git-send-email-jiri@resnulli.us>
Le 22/01/2014 09:05, Jiri Pirko a écrit :
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
You didn't tell it in the commit log, but you've also fix the API
to add the UNSPEC attribute (IFLA_BOND_SLAVE_UNSPEC), nice!
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
> drivers/net/bonding/bond_netlink.c | 12 ++++++------
> include/uapi/linux/if_link.h | 19 ++++++++++---------
> net/core/rtnetlink.c | 16 ++++++++--------
> 3 files changed, 24 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
> index 21c6488..dd786a3 100644
> --- a/drivers/net/bonding/bond_netlink.c
> +++ b/drivers/net/bonding/bond_netlink.c
> @@ -27,27 +27,27 @@ int bond_get_slave(struct net_device *slave_dev, struct sk_buff *skb)
> struct slave *slave = bond_slave_get_rtnl(slave_dev);
> const struct aggregator *agg;
>
> - if (nla_put_u8(skb, IFLA_SLAVE_STATE, bond_slave_state(slave)))
> + if (nla_put_u8(skb, IFLA_BOND_SLAVE_STATE, bond_slave_state(slave)))
> goto nla_put_failure;
>
> - if (nla_put_u8(skb, IFLA_SLAVE_MII_STATUS, slave->link))
> + if (nla_put_u8(skb, IFLA_BOND_SLAVE_MII_STATUS, slave->link))
> goto nla_put_failure;
>
> - if (nla_put_u32(skb, IFLA_SLAVE_LINK_FAILURE_COUNT,
> + if (nla_put_u32(skb, IFLA_BOND_SLAVE_LINK_FAILURE_COUNT,
> slave->link_failure_count))
> goto nla_put_failure;
>
> - if (nla_put(skb, IFLA_SLAVE_PERM_HWADDR,
> + if (nla_put(skb, IFLA_BOND_SLAVE_PERM_HWADDR,
> slave_dev->addr_len, slave->perm_hwaddr))
> goto nla_put_failure;
>
> - if (nla_put_u16(skb, IFLA_SLAVE_QUEUE_ID, slave->queue_id))
> + if (nla_put_u16(skb, IFLA_BOND_SLAVE_QUEUE_ID, slave->queue_id))
> goto nla_put_failure;
>
> if (slave->bond->params.mode == BOND_MODE_8023AD) {
> agg = SLAVE_AD_INFO(slave).port.aggregator;
> if (agg)
> - if (nla_put_u16(skb, IFLA_SLAVE_AD_AGGREGATOR_ID,
> + if (nla_put_u16(skb, IFLA_BOND_SLAVE_AD_AGGREGATOR_ID,
> agg->aggregator_identifier))
> goto nla_put_failure;
> }
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index ba2f3bf..1f30b85 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -144,7 +144,7 @@ enum {
> IFLA_NUM_RX_QUEUES,
> IFLA_CARRIER,
> IFLA_PHYS_PORT_ID,
> - IFLA_SLAVE,
> + IFLA_BOND_SLAVE,
> __IFLA_MAX
> };
>
> @@ -370,16 +370,17 @@ enum {
> #define IFLA_BOND_AD_INFO_MAX (__IFLA_BOND_AD_INFO_MAX - 1)
>
> enum {
> - IFLA_SLAVE_STATE,
> - IFLA_SLAVE_MII_STATUS,
> - IFLA_SLAVE_LINK_FAILURE_COUNT,
> - IFLA_SLAVE_PERM_HWADDR,
> - IFLA_SLAVE_QUEUE_ID,
> - IFLA_SLAVE_AD_AGGREGATOR_ID,
> - __IFLA_SLAVE_MAX,
> + IFLA_BOND_SLAVE_UNSPEC,
> + IFLA_BOND_SLAVE_STATE,
> + IFLA_BOND_SLAVE_MII_STATUS,
> + IFLA_BOND_SLAVE_LINK_FAILURE_COUNT,
> + IFLA_BOND_SLAVE_PERM_HWADDR,
> + IFLA_BOND_SLAVE_QUEUE_ID,
> + IFLA_BOND_SLAVE_AD_AGGREGATOR_ID,
> + __IFLA_BOND_SLAVE_MAX,
> };
>
> -#define IFLA_SLAVE_MAX (__IFLA_SLAVE_MAX - 1)
> +#define IFLA_BOND_SLAVE_MAX (__IFLA_BOND_SLAVE_MAX - 1)
>
> /* SR-IOV virtual function management section */
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 4f85de7..cace149 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -725,13 +725,13 @@ static size_t rtnl_bond_slave_size(const struct net_device *dev)
> {
> struct net_device *bond;
> size_t slave_size =
> - nla_total_size(sizeof(struct nlattr)) + /* IFLA_SLAVE */
> - nla_total_size(1) + /* IFLA_SLAVE_STATE */
> - nla_total_size(1) + /* IFLA_SLAVE_MII_STATUS */
> - nla_total_size(4) + /* IFLA_SLAVE_LINK_FAILURE_COUNT */
> - nla_total_size(MAX_ADDR_LEN) + /* IFLA_SLAVE_PERM_HWADDR */
> - nla_total_size(2) + /* IFLA_SLAVE_QUEUE_ID */
> - nla_total_size(2) + /* IFLA_SLAVE_AD_AGGREGATOR_ID */
> + nla_total_size(sizeof(struct nlattr)) + /* IFLA_BOND_SLAVE */
> + nla_total_size(1) + /* IFLA_BOND_SLAVE_STATE */
> + nla_total_size(1) + /* IFLA_BOND_SLAVE_MII_STATUS */
> + nla_total_size(4) + /* IFLA_BOND_SLAVE_LINK_FAILURE_COUNT */
> + nla_total_size(MAX_ADDR_LEN) + /* IFLA_BOND_SLAVE_PERM_HWADDR */
> + nla_total_size(2) + /* IFLA_BOND_SLAVE_QUEUE_ID */
> + nla_total_size(2) + /* IFLA_BOND_SLAVE_AD_AGGREGATOR_ID */
> 0;
>
> if (netif_is_bond_slave((struct net_device *)dev)) {
> @@ -883,7 +883,7 @@ static size_t rtnl_bond_slave_fill(struct sk_buff *skb, struct net_device *dev)
> if (!bond || !bond->netdev_ops->ndo_get_slave)
> return 0;
>
> - nest = nla_nest_start(skb, IFLA_SLAVE);
> + nest = nla_nest_start(skb, IFLA_BOND_SLAVE);
> if (!nest)
> return -EMSGSIZE;
>
>
^ permalink raw reply
* [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up
From: Sabrina Dubroca @ 2014-01-22 15:35 UTC (permalink / raw)
To: netdev; +Cc: hannes, gaofeng, Sabrina Dubroca
This patch solves bug 67951 on bugzilla
https://bugzilla.kernel.org/show_bug.cgi?id=67951
> Bug: No reply after loopback down-up
>
> kernel:3.13.0-rc4
>
> I did a test like this:
> On host1:
> 1)ifconfig eth0 add fd00::3
> On host2:
> 1)ifconfig eth0 add fd00::4
> 2)ping6 fd00::3
>
> Everything went OK.
>
> Then:
> On host1:
> 1)ifconfig lo down;ifconfig lo up
> On hsot2:
> 1)ping6 fd00::3
>
> There was no reply from host1!
Reply from Hannes:
> This was first fixed with 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f
> ("net IPv6 : Fix broken IPv6 routing table after loopback down-up")
> and broken again in a881ae1f625c599b460cc8f8a7fcb1c438f699ad ("ipv6:
> don't call addrconf_dst_alloc again when enable lo")
This patch first reverts commit a881ae1f625, since from what I can
see, it actually stops all local routes from being added back, instead
of just the sit ones. Which is actually good, since even with a normal
interface (tried with e1000 in qemu), when I reverted a881ae1f625, I
could get the "unregister_netdevice" problem when doing rmmod e1000:
unregister_netdevice: waiting for ens3 to become free. Usage count = 1
(repeated until I removed the local routes manually)
The patch then adds a function that detects if the local route for an
IP address has been re-created before init_loopback is called. That
way, we avoid the usage count problem, but if we do a down-up on
loopback, the necessary routes are created.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
I'm sending this as a RFC, because I'm not sure the way I detect if
the route has been added before is good.
It works in my tests (below). Is the call to rt6_lookup ok? Or is
there a better way to do that? The flags checked are the ones set by
addrconf_dst_alloc, so that if the route returned by rt6_lookup isn't
the local route for this IP address, it should be ignored and a new
local route created.
I've put addrconf_dst_alloc_once right above init_loopback in
addrconf.c, but it could go somewhere else, or be included directly in
init_loopback.
All suggestions and comments are welcome.
Thanks,
Sabrina
---
Here are the tests. They all fail with the unregister_netdevice
problem if I simply revert the a881ae1f625, but everything looks fine
with the patch below: no unregister_netdevice problem, and all the
local routes are back, ping works.
---
#############
# sit tests #
#############
# setup sit
ip tunnel add sit1 mode sit remote 10.0.1.10 local 10.0.1.24 ttl 10
ip link set sit1 up
ip a a fd00::1/64 dev sit1
ip r a ::/0 dev sit1
# test 1
ifconfig lo down ; ifconfig lo up
ip tunnel del sit1
# test 2
# before: setup sit tunnel again
ifconfig lo down ; ifconfig ens3 down ; ifconfig sit1 down ; ifconfig sit1 up
ip a a fd00::1/64 dev sit1
./ens3.sh
ifconfig lo up
ip tunnel del sit1
###############
# e1000 tests #
###############
#########
# ens3.sh
ifconfig ens3 up
ip a a fec0::3456/64 dev ens3
#########
./ens3.sh
# test 1
ifconfig lo down ; ifconfig lo up
rmmod e1000
# test 2
# before: setup things again
ifconfig lo down ; ifconfig ens3 down
./ens3.sh
ifconfig lo up
rmmod e1000
---
net/ipv6/addrconf.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4b6b720..b2eb653 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2578,6 +2578,23 @@ static void sit_add_v4_addrs(struct inet6_dev *idev)
}
#endif
+struct rt6_info *addrconf_dst_alloc_once(struct inet6_dev *idev,
+ const struct in6_addr *addr,
+ bool anycast)
+{
+ struct net *net = dev_net(idev->dev);
+ struct rt6_info *rt = rt6_lookup(net, addr, NULL, 0, 0);
+
+ if (rt == NULL)
+ return addrconf_dst_alloc(idev, addr, anycast);
+ else if (!(rt->rt6i_flags & RTF_NONEXTHOP &&
+ ((anycast && rt->rt6i_flags & RTF_ANYCAST) ||
+ (!anycast && rt->rt6i_flags & RTF_LOCAL))))
+ return addrconf_dst_alloc(idev, addr, anycast);
+
+ return ERR_PTR(-EEXIST);
+}
+
static void init_loopback(struct net_device *dev)
{
struct inet6_dev *idev;
@@ -2611,10 +2628,7 @@ static void init_loopback(struct net_device *dev)
if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
continue;
- if (sp_ifa->rt)
- continue;
-
- sp_rt = addrconf_dst_alloc(idev, &sp_ifa->addr, false);
+ sp_rt = addrconf_dst_alloc_once(idev, &sp_ifa->addr, false);
/* Failure cases are ignored */
if (!IS_ERR(sp_rt)) {
--
1.8.5.3
^ permalink raw reply related
* Re: [PATCH 1/1] Per socket value for max datagram queue length
From: Hannes Frederic Sowa @ 2014-01-22 15:32 UTC (permalink / raw)
To: Dan Ballard, Lennart Poettering, kay.sievers, Arnd Bergmann,
David S. Miller, Eric Dumazet, Eliezer Tamir, Neil Horman,
Li Zefan, linux-arch, linux-kernel, netdev
In-Reply-To: <20140122152036.GB7269@order.stressinduktion.org>
On Wed, Jan 22, 2014 at 04:20:36PM +0100, Hannes Frederic Sowa wrote:
> On Wed, Jan 22, 2014 at 07:11:20AM -0800, Dan Ballard wrote:
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 5393b4b..1ff69d1 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -915,6 +915,10 @@ set_rcvbuf:
> > sk->sk_max_pacing_rate);
> > break;
> >
> > + case SO_MAX_DGRAM_QLEN:
> > + sk->sk_max_ack_backlog = val;
> > + break;
> > +
>
> Shouldn't the backlog be capped for unprivileged users to some configurable
> value? I even think that max_dgram_qlen should be the upper bound.
>
> I guess it is not that serious as socket read accounting does account all
> packets which sit in the backlog queue.
Just a follow-up:
sk_max_ack_backlog is also responsible for limiting the af_unix
dgram queues. Currently there is no socket accounting for the read
side of those unix dgram sockets. I tried to fix this once here,
http://patchwork.ozlabs.org/patch/231032/, but until that is done we
depend on max_dgram_qlen to limit those queues at all.
I hope I can get back to this patch anytime soon, as it solves the problem
that a bidirectional protocol ping-ponging with a dgram server socket
and not fetching its messages from the backlog queue can bring a server
to halt because it doesn't have any send space on the socket anymore.
Greetings,
Hannes
^ permalink raw reply
* RE: [PATCH] net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer
From: David Laight @ 2014-01-22 15:30 UTC (permalink / raw)
To: 'Vlad Yasevich', 'Matija Glavinic Pecotic',
linux-sctp@vger.kernel.org
Cc: Alexander Sverdlin, netdev@vger.kernel.org
In-Reply-To: <52DFDB1E.7070701@gmail.com>
From: Vlad Yasevich
...
> > IIRC the 'size' taken of the socket buffer is the skb's 'true size'
> > and that includes any padding before and after the actual rx data.
> > For short packets the driver may have copied the data into a smaller
> > skb, for long packets it is likely to be more than that of a full
> > length ethernet packet.
> > In either case it can be significantly more than sizeof(sk_buff)
> > (190?) plus the size of the actual data.
>
> SCTP currently doesn't support GRO, so each packet is limited to
> ethernet packet plus sk_buff overhead.
The ethernet driver might still pass up a 2k buffer, or even a 4k one.
Especially if it supports GRO for TCP.
> What throws a real monkey
> wrench into this whole accounting business is SCTP bundling. If you
> bundle multiple messages into the single packet, accounting for it
> is a mess.
How about dividing the 'truesize' by the reference count?
(except that isn't quite right...)
I assume there are multiple skb headers but only one data buffer?
At least the chunks are all for the same connection so end up on
one socket (except I remember some other horrid stuff for datagrams).
> > I'm also not sure that continuously removing 'credit' is a good idea.
> > I've done a lot of comms protocol code, removing credit and 'window
> > slamming' acks are not good ideas.
>
> This patch doesn't continuously remove 'credit'. It advertises the
> closest approximation of the window that we support and computes it
> the same way as we do for Initial Window (available sk_rcvbuff >> 1).
> As the receiver drains the receive queue, available buffer will increase
> and the available window will grow.
Let's assume the socket buffer size is 100k.
We advertise a window of 50k.
We now receive 100 bytes of data, the remote has 49900 window left.
The 'truesize' is something like 190+64(ish)+100+tail_pad, say 400.
Socket buffer space is reduced to 99600 and any ack would give 49800.
So we have reduced the window by 100 bytes.
With that much space it probably doesn't matter much.
However if the connection is receive limited then the remote system
will have a second packet in flight that uses the rest of the window.
We then receive an in-sequence but out-of-window packet that refers
to window that we had previously given to the remote system.
I don't know what the sctp (or tcp) code does with such packets.
In my experience it is best to treat them as valid data (unless
there are larger free memory issues) and ack them at some point.
Hopefully the rules of the underlying protocol let you do this!
If you discard these packets then every packet gets sent twice
(or even more often if the data is very short).
David
^ permalink raw reply
* Re: [PATCH 1/1] Per socket value for max datagram queue length
From: Daniel Borkmann @ 2014-01-22 15:32 UTC (permalink / raw)
To: Dan Ballard
Cc: Lennart Poettering, kay.sievers, Arnd Bergmann, David S. Miller,
Eric Dumazet, Eliezer Tamir, Neil Horman, Li Zefan, linux-arch,
linux-kernel, netdev, Hannes Frederic Sowa
In-Reply-To: <13e8aad86903481261581de7c29444e3@mindstab.net>
On 01/22/2014 04:11 PM, Dan Ballard wrote:
> Provides a new option for setsockopt SO_MAX_DGRAM_QLEN that sets and
> gets a socket specific max datagram queue length. Currently each socket
> has one but it's only ever initialized from
> /proc/sys/net/unix/max_dgram_qlen and then never adjustable later. Now
> each socket can have it individually tweaked during it's life.
>
> Signed-off-by: Dan Ballard <dan@mindstab.net>
> ---
> include/uapi/asm-generic/socket.h | 2 ++
> net/core/sock.c | 7 +++++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
> index 38f14d0..f8c3e6b 100644
> --- a/include/uapi/asm-generic/socket.h
> +++ b/include/uapi/asm-generic/socket.h
> @@ -80,4 +80,6 @@
>
> #define SO_MAX_PACING_RATE 47
>
> +#define SO_MAX_DGRAM_QLEN 48
> +
This needs to be added in more than just asm-generic, e.g.
have a look at SO_MAX_PACING_RATE or SO_BPF_EXTENSIONS.
Also you might need to rebase to current net-next head and
maybe describe use cases more in-depth; next to what Hannes
just commented.
> #endif /* __ASM_GENERIC_SOCKET_H */
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 5393b4b..1ff69d1 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -915,6 +915,10 @@ set_rcvbuf:
> sk->sk_max_pacing_rate);
> break;
>
> + case SO_MAX_DGRAM_QLEN:
> + sk->sk_max_ack_backlog = val;
> + break;
> +
> default:
> ret = -ENOPROTOOPT;
> break;
> @@ -1182,6 +1186,9 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
> v.val = sk->sk_max_pacing_rate;
> break;
>
> + case SO_MAX_DGRAM_QLEN:
> + v.val = sk->sk_max_ack_backlog;
> + break;
> default:
> return -ENOPROTOOPT;
> }
^ permalink raw reply
* Re: [PATCH] net: Correctly sync addresses from multiple sources to single device
From: Vlad Yasevich @ 2014-01-22 15:30 UTC (permalink / raw)
To: David Miller; +Cc: netdev, andrey.dmitrov
In-Reply-To: <20140121.145355.1406240839568386348.davem@davemloft.net>
On 01/21/2014 05:53 PM, David Miller wrote:
> From: Vlad Yasevich <vyasevic@redhat.com>
> Date: Fri, 17 Jan 2014 14:52:12 -0500
>
>> When we have multiple devices attempting to sync the same address
>> to a single destination, each device should be permitted to sync
>> it once. To accomplish this, pass the sync count of the source
>> address to __hw_addr_add_ex(). 'sync_cnt' tracks how many time
>> a given address has been successfully synced. If the address
>> is found in the destination list, but the 'sync_cnt' of the source
>> is 0, then this address has not been synced from this interface
>> and we need to allow the sync operation to succeed thus incrementing
>> reference counts.
>>
>> Reported-by: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
>> CC: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
>> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
>
> I applied this to net-next since 3.13 just got released, and it doesn't
> compile.
>
> net/core/dev_addr_lists.c: In function ‘__hw_addr_sync_one’:
> net/core/dev_addr_lists.c:144:26: error: ‘struct netdev_hw_addr’ has no member named ‘sync_count’
>
That does it... Time to add a hook to format_patch to check for dirty
index.
-vlad
^ permalink raw reply
* Re: [PATCH 1/1] Per socket value for max datagram queue length
From: Eric Dumazet @ 2014-01-22 15:30 UTC (permalink / raw)
To: Dan Ballard
Cc: Lennart Poettering, kay.sievers, Arnd Bergmann, David S. Miller,
Eric Dumazet, Eliezer Tamir, Neil Horman, Li Zefan, linux-arch,
linux-kernel, netdev
In-Reply-To: <13e8aad86903481261581de7c29444e3@mindstab.net>
On Wed, 2014-01-22 at 07:11 -0800, Dan Ballard wrote:
> Provides a new option for setsockopt SO_MAX_DGRAM_QLEN that sets and
> gets a socket specific max datagram queue length. Currently each socket
> has one but it's only ever initialized from
> /proc/sys/net/unix/max_dgram_qlen and then never adjustable later. Now
> each socket can have it individually tweaked during it's life.
>
Your patch suffers from many problems.
It breaks listen()
It doesn't compile on a lot of arches.
sk_max_ack_backlog is an unsigned short.
max_dgram_qlen is used, but only for Unix sockets at this time
Try : "git grep -n max_dgram_qlen" for details
> Signed-off-by: Dan Ballard <dan@mindstab.net>
> ---
> include/uapi/asm-generic/socket.h | 2 ++
> net/core/sock.c | 7 +++++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/include/uapi/asm-generic/socket.h
> b/include/uapi/asm-generic/socket.h
> index 38f14d0..f8c3e6b 100644
> --- a/include/uapi/asm-generic/socket.h
> +++ b/include/uapi/asm-generic/socket.h
> @@ -80,4 +80,6 @@
>
> #define SO_MAX_PACING_RATE 47
>
> +#define SO_MAX_DGRAM_QLEN 48
> +
> #endif /* __ASM_GENERIC_SOCKET_H */
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 5393b4b..1ff69d1 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -915,6 +915,10 @@ set_rcvbuf:
> sk->sk_max_pacing_rate);
> break;
>
> + case SO_MAX_DGRAM_QLEN:
> + sk->sk_max_ack_backlog = val;
> + break;
> +
> default:
> ret = -ENOPROTOOPT;
> break;
> @@ -1182,6 +1186,9 @@ int sock_getsockopt(struct socket *sock, int
> level, int optname,
> v.val = sk->sk_max_pacing_rate;
> break;
>
> + case SO_MAX_DGRAM_QLEN:
> + v.val = sk->sk_max_ack_backlog;
> + break;
> default:
> return -ENOPROTOOPT;
> }
^ permalink raw reply
* Re: [PATCH net-next] bridge: Remove unnecessary vlan_put_tag in br_handle_vlan
From: Vlad Yasevich @ 2014-01-22 15:22 UTC (permalink / raw)
To: Toshiaki Makita, David S . Miller, Stephen Hemminger, netdev
In-Reply-To: <1390350577-4122-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>
On 01/21/2014 07:29 PM, Toshiaki Makita wrote:
> br_handle_vlan() pushes HW accelerated vlan tag into skbuff when outgoing
> port is the bridge device.
> This is unnecessary because __netif_receive_skb_core() can handle skbs
> with HW accelerated vlan tag. In current implementation,
> __netif_receive_skb_core() needs to extract the vlan tag embedded in skb
> data. This could cause low network performance especially when receiving
> frames at a high frame rate on the bridge device.
>
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
-vlad
> ---
> net/bridge/br_vlan.c | 21 ---------------------
> 1 file changed, 21 deletions(-)
>
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 7ffc801..4ca4d0a 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -151,27 +151,6 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
> br_vlan_get_tag(skb, &vid);
> if (test_bit(vid, pv->untagged_bitmap))
> skb = br_vlan_untag(skb);
> - else {
> - /* Egress policy says "send tagged". If output device
> - * is the bridge, we need to add the VLAN header
> - * ourselves since we'll be going through the RX path.
> - * Sending to ports puts the frame on the TX path and
> - * we let dev_hard_start_xmit() add the header.
> - */
> - if (skb->protocol != htons(ETH_P_8021Q) &&
> - pv->port_idx == 0) {
> - /* vlan_put_tag expects skb->data to point to
> - * mac header.
> - */
> - skb_push(skb, ETH_HLEN);
> - skb = __vlan_put_tag(skb, skb->vlan_proto, skb->vlan_tci);
> - if (!skb)
> - goto out;
> - /* put skb->data back to where it was */
> - skb_pull(skb, ETH_HLEN);
> - skb->vlan_tci = 0;
> - }
> - }
>
> out:
> return skb;
>
^ permalink raw reply
* Re: [PATCH 1/1] Per socket value for max datagram queue length
From: Hannes Frederic Sowa @ 2014-01-22 15:20 UTC (permalink / raw)
To: Dan Ballard
Cc: Lennart Poettering, kay.sievers, Arnd Bergmann, David S. Miller,
Eric Dumazet, Eliezer Tamir, Neil Horman, Li Zefan, linux-arch,
linux-kernel, netdev
In-Reply-To: <13e8aad86903481261581de7c29444e3@mindstab.net>
On Wed, Jan 22, 2014 at 07:11:20AM -0800, Dan Ballard wrote:
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 5393b4b..1ff69d1 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -915,6 +915,10 @@ set_rcvbuf:
> sk->sk_max_pacing_rate);
> break;
>
> + case SO_MAX_DGRAM_QLEN:
> + sk->sk_max_ack_backlog = val;
> + break;
> +
Shouldn't the backlog be capped for unprivileged users to some configurable
value? I even think that max_dgram_qlen should be the upper bound.
I guess it is not that serious as socket read accounting does account all
packets which sit in the backlog queue.
Greetings,
Hannes
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox