Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: linux-3.14-rc1 & PACKET_QDISC_BYPASS : slow_path warning
From: Felix Fietkau @ 2014-02-04 14:48 UTC (permalink / raw)
  To: Mathias Kretschmer, Daniel Borkmann
  Cc: netdev, linux-wireless@vger.kernel.org, Jesper Dangaard Brouer
In-Reply-To: <52F0FA96.7090903@fokus.fraunhofer.de>

On 2014-02-04 15:35, Mathias Kretschmer wrote:
> On 02/04/2014 03:25 PM, Daniel Borkmann wrote:
>> On 02/04/2014 02:13 PM, Mathias Kretschmer wrote:
>>> On 02/04/2014 01:56 PM, Daniel Borkmann wrote:
>>>> On 02/03/2014 11:47 PM, Mathias Kretschmer wrote:
>> ...
>>>>> we are developing a wired/wireless MPLS switch. Currently the data plane runs in
>>>>> user space using PF_PACKET sockets via RX_RING/TX_RING.
>>>>>
>>>>> We had hoped to test the PACKET_QDISC_BYPASS option since this seems to be the
>>>>> proper optimization for our purposes.
>>>>>
>>>>> Unfortunately, we're seeing a 'slow path' warning for every packet that is being
>>>>> sent out. With PACKET_QDISC_BYPASS disabled, no warnings are dumped. Hardware is
>>>>> an older AMD Geode LX embedded board (ALiX).
>>>>>
>>>>> BTW, this happens while sending via a wireless (802.11) adhoc interface. Hence, it
>>>>> might be an interaction with the ieee80211 sub system.
>>>>
>>>> Hm, so the WARN_ON() is triggered inside ath9k driver in relation to 802.11 QoS,
>>>> and came in from commit 066dae93bdf ("ath9k: rework tx queue selection and fix
>>>> queue stopping/waking"). We did the stress testing of that option for PF_PACKET
>>>> on 10Gbit/s NICs. Seems to me you might be running into the same issue when using
>>>> pktgen as it randomly or per round-robin selects tx queues as well? Not entirely
>>>> sure how necessary this WARN_ON() is though, Felix? I think QDISC_BYPASS might not
>>>> be the best option in your case, perhaps you will run into increased power usage
>>>> in your NIC as a side-effect?
>>>
>>> I'm not familiar with the exact implementation details, but from the description
>>> of this option, it seems to me that this is exactly what one would want to use if
>>> the goal is to send an Ethernet frame out on a particular interface without any
>>> further processing by the kernel.
>>>
>>> Why would this increase the power usage on the NIC ?  Due to a higher achievable
>>> packet rate ?  That would be acceptable :)
>>
>> I'm not too familiar with the ieee80211 sub system, so I let Felix answer side
>> effects and if actually the WARN_ON() is needed. ;) PACKET_QDISC_BYPASS is, as
>> documented, designed for advanced pktgen resp. traffic generator like scenarios
>> where you just sort of "brute force" packets to your NIC to stress test a remote
>> machine for further analysis. I don't think it's very useful in your scenario
>> when you have a wired/wireless MPLS switch, you rather might want to buffer/queue
>> and therefore use qdisc layer instead.
> 
> Hm, I was hoping/assuming that we still get to use hardware queues, if provided by 
> the driver. The main goal was to avoid any further PF_PACKET framework overhead.
> 
> If the WARN_ON() issue gets solved, we will revisit this option and evaluate its 
> applicability.
The reason for the WARN_ON is probably either the .ndo_select_queue call
is not run, or its queue selection result is changed before the frame
hits the driver's tx call.
This call sets both the queue and the TID (similar to 802.1d tag), which
makes it into the packet via 802.11e (WMM, QoS).
It is important to the driver that the TID is in sync with the queue
selection, if that is not the case, then pending frame counters can get
messed up.
If you really want to bypass qdisc, make sure that at least
ndo_select_queue is called before passing the frame to the netdev.

- Felix

^ permalink raw reply

* [PATCH net-next] tipc: explicitly include core.h in addr.h
From: andreas.bofjall @ 2014-02-04 14:49 UTC (permalink / raw)
  To: netdev; +Cc: tipc-discussion

From: Andreas Bofjäll <andreas.bofjall@ericsson.com>

The inline functions in addr.h uses tipc_own_addr which is exported by
core.h, but addr.h never actually includes it. It works because it is
explicitly included where this is used, but it looks a bit strange.

Include core.h in addr.h explicitly to make the dependency clearer.

Signed-off-by: Andreas Bofjäll <andreas.bofjall@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/addr.h |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/tipc/addr.h b/net/tipc/addr.h
index 60b00ab..a74acf9 100644
--- a/net/tipc/addr.h
+++ b/net/tipc/addr.h
@@ -37,6 +37,8 @@
 #ifndef _TIPC_ADDR_H
 #define _TIPC_ADDR_H
 
+#include "core.h"
+
 #define TIPC_ZONE_MASK		0xff000000u
 #define TIPC_CLUSTER_MASK	0xfffff000u
 
-- 
1.7.9.5


------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

^ permalink raw reply related

* Re: [PATCH] net:cpsw: Pass unhandled ioctl's on to generic phy ioctl
From: Sørensen, Stefan @ 2014-02-04 15:08 UTC (permalink / raw)
  To: ben@decadent.org.uk
  Cc: davem@davemloft.net, netdev@vger.kernel.org, mugunthanvnm@ti.com
In-Reply-To: <1391511050.3003.21.camel@deadeye.wl.decadent.org.uk>

On Tue, 2014-02-04 at 10:50 +0000, Ben Hutchings wrote:
> > This patch allows the use of a generic timestamping phy connected
> > to the cpsw if CPTS support is not enabled.
> 
> What if CPTS support is enabled in the driver, but this particular
> machine doesn't have it and uses a timestamping PHY instead?

That would not work, the CPTS will grab the SIOC{G,S}HWTSTAMP. I'm not
sure how that could be configured at runtime, other than a private
ethtool flag.

Also it seem as the situation with a timestamping MAC and a timestamping
PHY could deliver bogus ethtool timestamping info, as it will come from
the PHY if available, but the timestamping will be handled by the MAC.

> >  	case SIOCGMIIPHY:
> >  		data->phy_id = priv->slaves[slave_no].phy->addr;
> 
> It looks like this existing code is broken, as the phy pointer can be
> NULL!

Right, but that code can go away since SIOCGMIIPHY is also handled by
phy_mii_ioctl. I will delete that case in the next version.

> > +	return phy_mii_ioctl(priv->slaves[slave_no].phy, req, cmd);
> 
> This presumably also enables SIOC{G,S}MIIREG, but you didn't mention
> that.

Yes, I will add that to the commit log in the next version.

Stefan

^ permalink raw reply

* Re: [PATCH] dp83640: Support a configurable number of periodic outputs
From: Sørensen, Stefan @ 2014-02-04 15:24 UTC (permalink / raw)
  To: ben@decadent.org.uk; +Cc: netdev@vger.kernel.org, richardcochran@gmail.com
In-Reply-To: <1391511238.3003.25.camel@deadeye.wl.decadent.org.uk>

On Tue, 2014-02-04 at 10:53 +0000, Ben Hutchings wrote:
> > Modules parameters are surely easiest (for the developer), but perhaps
> > the time has come for the "right way."
> > 
> > Currently there is no interface for configuring the GPIOs used by PHC
> > devices, and last year I was going to add the GPIOs to the igb
> > driver. The conclusion of the discussion was that module parameters
> > are bad for this, but ethtool is good.
> > 
> >   http://www.spinics.net/lists/netdev/msg237692.html
> > 
> > [ I did not follow through to come up with an ethtool way of configuring
> >   the igb pins. ]
> > 
> > Even though it is more work, I think the way forward is to invent a
> > way to let the user configure this kind of thing via ethtool. Would
> > you care to take a stab at this?
> 
> It seems to me that this is board configuration, not user configuration,
> so it should be defined in DT or ACPI or whatever.
> 
> Are there boards which expose multiple GPIOs for users to plug together
> PPS signals?

I would definitely prefer the DT approach. On all the platforms I have
used, the GPIO configuration has been set in stone by the HW. But there
might be some dev boards or the like that has them configurable.

Stefan

^ permalink raw reply

* [PATCH net] bnx2x: fix L2-GRE TCP issues
From: Dmitry Kravkov @ 2014-02-04 15:43 UTC (permalink / raw)
  To: davem, netdev; +Cc: mschmidt, Dmitry Kravkov, Ariel Elior

When configuring GRE tunnel using OVS, tcp stream is distributed over
all RSS queues which may cause TCP reordering. It happens since OVS
uses L2GRE protocol when kernel gre uses IPGRE.
Patch defaults gre tunnel to L2GRE which allows proper RSS for L2GRE
packets and (implicitly) disables RSS for IPGRE traffic.

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 17d1689..bfc58d4 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -936,7 +936,7 @@ static inline int bnx2x_func_start(struct bnx2x *bp)
 	else /* CHIP_IS_E1X */
 		start_params->network_cos_mode = FW_WRR;
 
-	start_params->gre_tunnel_mode = IPGRE_TUNNEL;
+	start_params->gre_tunnel_mode = L2GRE_TUNNEL;
 	start_params->gre_tunnel_rss = GRE_INNER_HEADERS_RSS;
 
 	return bnx2x_func_state_change(bp, &func_params);
-- 
1.8.5.3

^ permalink raw reply related

* Re: [PATCH] dp83640: Support a configurable number of periodic outputs
From: Richard Cochran @ 2014-02-04 15:43 UTC (permalink / raw)
  To: Sørensen, Stefan; +Cc: ben@decadent.org.uk, netdev@vger.kernel.org
In-Reply-To: <1391527464.7871.15.camel@e37108.spectralink.com>

On Tue, Feb 04, 2014 at 03:24:27PM +0000, Sørensen, Stefan wrote:
> 
> I would definitely prefer the DT approach. On all the platforms I have
> used, the GPIO configuration has been set in stone by the HW. But there
> might be some dev boards or the like that has them configurable.

I think DT is fine for the initial configuration, but the ethtool way
will still be needed in at least two cases.

1. Platform has no DT at all.
2. User wants to change the function at run time.

Thanks,
Richard

^ permalink raw reply

* [PATCH] net: irda: ep7211-sir: Remove driver
From: Alexander Shiyan @ 2014-02-04 15:43 UTC (permalink / raw)
  To: netdev; +Cc: Samuel Ortiz, Alexander Shiyan

This patch removes old and unsupported CLPS711X IrDA driver.
Support for IrDA for CLPS711X serial port now provided by commit
4a33f1f59abd (serial: clps711x: Add support for N_IRDA line
discipline), so IrDA-mode can be turned ON with "irattach" tool
through "irtty" driver.

Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
---
 drivers/net/irda/Kconfig      |  7 -----
 drivers/net/irda/Makefile     |  1 -
 drivers/net/irda/ep7211-sir.c | 70 -------------------------------------------
 3 files changed, 78 deletions(-)
 delete mode 100644 drivers/net/irda/ep7211-sir.c

diff --git a/drivers/net/irda/Kconfig b/drivers/net/irda/Kconfig
index 2dc82f1..3da44d5 100644
--- a/drivers/net/irda/Kconfig
+++ b/drivers/net/irda/Kconfig
@@ -210,13 +210,6 @@ config KINGSUN_DONGLE
 	  To compile it as a module, choose M here: the module will be called
 	  kingsun-sir.
 
-config EP7211_DONGLE
-	tristate "Cirrus Logic clps711x I/R support"
-	depends on IRTTY_SIR && ARCH_CLPS711X && IRDA
-	help
-	  Say Y here if you want to build support for the Cirrus logic
-	  EP7211 chipset's infrared module.
-
 config KSDAZZLE_DONGLE
 	tristate "KingSun Dazzle IrDA-USB dongle"
 	depends on IRDA && USB
diff --git a/drivers/net/irda/Makefile b/drivers/net/irda/Makefile
index dfc6453..be8ab5b 100644
--- a/drivers/net/irda/Makefile
+++ b/drivers/net/irda/Makefile
@@ -35,7 +35,6 @@ obj-$(CONFIG_MCP2120_DONGLE)	+= mcp2120-sir.o
 obj-$(CONFIG_ACT200L_DONGLE)	+= act200l-sir.o
 obj-$(CONFIG_MA600_DONGLE)	+= ma600-sir.o
 obj-$(CONFIG_TOIM3232_DONGLE)	+= toim3232-sir.o
-obj-$(CONFIG_EP7211_DONGLE)	+= ep7211-sir.o
 obj-$(CONFIG_KINGSUN_DONGLE)	+= kingsun-sir.o
 obj-$(CONFIG_KSDAZZLE_DONGLE)	+= ksdazzle-sir.o
 obj-$(CONFIG_KS959_DONGLE)    	+= ks959-sir.o
diff --git a/drivers/net/irda/ep7211-sir.c b/drivers/net/irda/ep7211-sir.c
deleted file mode 100644
index 5fe1f4d..0000000
--- a/drivers/net/irda/ep7211-sir.c
+++ /dev/null
@@ -1,70 +0,0 @@
-/*
- * IR port driver for the Cirrus Logic CLPS711X processors
- *
- * Copyright 2001, Blue Mug Inc.  All rights reserved.
- * Copyright 2007, Samuel Ortiz <samuel@sortiz.org>
- */
-
-#include <linux/module.h>
-#include <linux/platform_device.h>
-
-#include <mach/hardware.h>
-
-#include "sir-dev.h"
-
-static int clps711x_dongle_open(struct sir_dev *dev)
-{
-	unsigned int syscon;
-
-	/* Turn on the SIR encoder. */
-	syscon = clps_readl(SYSCON1);
-	syscon |= SYSCON1_SIREN;
-	clps_writel(syscon, SYSCON1);
-
-	return 0;
-}
-
-static int clps711x_dongle_close(struct sir_dev *dev)
-{
-	unsigned int syscon;
-
-	/* Turn off the SIR encoder. */
-	syscon = clps_readl(SYSCON1);
-	syscon &= ~SYSCON1_SIREN;
-	clps_writel(syscon, SYSCON1);
-
-	return 0;
-}
-
-static struct dongle_driver clps711x_dongle = {
-	.owner		= THIS_MODULE,
-	.driver_name	= "EP7211 IR driver",
-	.type		= IRDA_EP7211_DONGLE,
-	.open		= clps711x_dongle_open,
-	.close		= clps711x_dongle_close,
-};
-
-static int clps711x_sir_probe(struct platform_device *pdev)
-{
-	return irda_register_dongle(&clps711x_dongle);
-}
-
-static int clps711x_sir_remove(struct platform_device *pdev)
-{
-	return irda_unregister_dongle(&clps711x_dongle);
-}
-
-static struct platform_driver clps711x_sir_driver = {
-	.driver	= {
-		.name	= "sir-clps711x",
-		.owner	= THIS_MODULE,
-	},
-	.probe	= clps711x_sir_probe,
-	.remove	= clps711x_sir_remove,
-};
-module_platform_driver(clps711x_sir_driver);
-
-MODULE_AUTHOR("Samuel Ortiz <samuel@sortiz.org>");
-MODULE_DESCRIPTION("EP7211 IR dongle driver");
-MODULE_LICENSE("GPL");
-MODULE_ALIAS("irda-dongle-13"); /* IRDA_EP7211_DONGLE */
-- 
1.8.3.2

^ permalink raw reply related

* Re: [PATCH] fdtable: Avoid triggering OOMs from alloc_fdmem
From: Eric Dumazet @ 2014-02-04 16:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, linux-kernel, linux-fsdevel, netdev, linux-mm,
	David Rientjes
In-Reply-To: <87r47jsb2p.fsf@xmission.com>

On Mon, 2014-02-03 at 21:26 -0800, Eric W. Biederman wrote:
> Recently due to a spike in connections per second memcached on 3
> separate boxes triggered the OOM killer from accept.  At the time the
> OOM killer was triggered there was 4GB out of 36GB free in zone 1. The
> problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
> hold a bitmap, and there was sufficient fragmentation that the largest
> page available was 8KiB.
> 
> I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
> but I do agree that order 3 allocations are very likely to succeed.
> 
> There are always pathologies where order > 0 allocations can fail when
> there are copious amounts of free memory available.  Using the pigeon
> hole principle it is easy to show that it requires 1 page more than 50%
> of the pages being free to guarantee an order 1 (8KiB) allocation will
> succeed, 1 page more than 75% of the pages being free to guarantee an
> order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of
> the pages being free to guarantee an order 3 allocate will succeed.
> 
> A server churning memory with a lot of small requests and replies like
> memcached is a common case that if anything can will skew the odds
> against large pages being available.
> 
> Therefore let's not give external applications a practical way to kill
> linux server applications, and specify __GFP_NORETRY to the kmalloc in
> alloc_fdmem.  Unless I am misreading the code and by the time the code
> reaches should_alloc_retry in __alloc_pages_slowpath (where
> __GFP_NORETRY becomes signification).  We have already tried everything
> reasonable to allocate a page and the only thing left to do is wait.  So
> not waiting and falling back to vmalloc immediately seems like the
> reasonable thing to do even if there wasn't a chance of triggering the
> OOM killer.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/file.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/file.c b/fs/file.c
> index 771578b33fb6..db25c2bdfe46 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -34,7 +34,7 @@ static void *alloc_fdmem(size_t size)
>  	 * vmalloc() if the allocation size will be considered "large" by the VM.
>  	 */
>  	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> -		void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
> +		void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
>  		if (data != NULL)
>  			return data;
>  	}

Hi Eric

I wrote yesterday a similar patch adding __GFP_NORETRY in following
paths. I feel that alloc_fdmem() is only a part of the problem ;)

What do you think, should we merge our changes or have distinct
patches ?

diff --git a/net/core/sock.c b/net/core/sock.c
index 0c127dcdf6a8..5b6a9431b017 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1775,7 +1775,9 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 			while (order) {
 				if (npages >= 1 << order) {
 					page = alloc_pages(sk->sk_allocation |
-							   __GFP_COMP | __GFP_NOWARN,
+							   __GFP_COMP |
+							   __GFP_NOWARN |
+							   __GFP_NORETRY,
 							   order);
 					if (page)
 						goto fill_page;
@@ -1845,7 +1847,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio)
 		gfp_t gfp = prio;
 
 		if (order)
-			gfp |= __GFP_COMP | __GFP_NOWARN;
+			gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY;
 		pfrag->page = alloc_pages(gfp, order);
 		if (likely(pfrag->page)) {
 			pfrag->offset = 0;






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* Re: [PATCH net-next 2/2] stmmac: add extended set multicast filter & devicetree options
From: Vince Bridgers @ 2014-02-04 17:10 UTC (permalink / raw)
  To: Giuseppe CAVALLARO
  Cc: devicetree, netdev, robh+dt, pawel.moll, mark.rutland,
	ijc+devicetree, Kumar Gala, Dinh Nguyen, Rayagond Kokatanur
In-Reply-To: <52F0B99E.6090302@st.com>

On Tue, Feb 4, 2014 at 3:57 AM, Giuseppe CAVALLARO
<peppe.cavallaro@st.com> wrote:
> On 1/31/2014 9:15 PM, Vince Bridgers wrote:
>>
>> The synopsys EMAC can be configured for different numbers of multicast
>> hash
>> bins and perfect filter entries at device creation time and there's no way
>> to query this configuration information at runtime. As a result, a
>> devicetree
>> parameter is required in order for the driver to program these filters
>> correctly for a particular device instance. This patch extends the current
>> driver by providing a different multicast filter programming function if
>> different than the currently supported 64 multicast hash bins and 32
>> perfect unicast addresses. This patch is required to correct multicast
>> filtering for the Altera Cyclone V SOC.
>>
>> Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>
>> ---
>>   drivers/net/ethernet/stmicro/stmmac/common.h       |   42 +++--
>>   drivers/net/ethernet/stmicro/stmmac/dwmac1000.h    |    5 +-
>>   .../net/ethernet/stmicro/stmmac/dwmac1000_core.c   |  161
>> +++++++++++++++++---
>>   .../net/ethernet/stmicro/stmmac/dwmac100_core.c    |   29 ++--
>>   .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |    6 +-
>>   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |   36 ++---
>>   .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   51 +++++++
>>   include/linux/stmmac.h                             |    2 +
>>   8 files changed, 261 insertions(+), 71 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h
>> b/drivers/net/ethernet/stmicro/stmmac/common.h
>> index 7834a39..ca07afe 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/common.h
>> +++ b/drivers/net/ethernet/stmicro/stmmac/common.h
>> @@ -294,6 +294,8 @@ struct dma_features {
>>
>>   #define JUMBO_LEN             9000
>>
>> +#define GMAC_MAX_PERFECT_ADDRESSES 32
>> +
>>   struct stmmac_desc_ops {
>>         /* DMA RX descriptor ring initialization */
>>         void (*init_rx_desc) (struct dma_desc *p, int disable_rx_ic, int
>> mode,
>> @@ -368,34 +370,37 @@ struct stmmac_dma_ops {
>>         void (*rx_watchdog) (void __iomem *ioaddr, u32 riwt);
>>   };
>>
>> +struct mac_device_info;
>> +
>>   struct stmmac_ops {
>>         /* MAC core initialization */
>> -       void (*core_init) (void __iomem *ioaddr, int mtu);
>> +       void (*core_init)(struct mac_device_info *hw, int mtu);
>>         /* Enable and verify that the IPC module is supported */
>> -       int (*rx_ipc) (void __iomem *ioaddr);
>> +       int (*rx_ipc)(struct mac_device_info *hw);
>>         /* Dump MAC registers */
>> -       void (*dump_regs) (void __iomem *ioaddr);
>> +       void (*dump_regs)(struct mac_device_info *hw);
>>         /* Handle extra events on specific interrupts hw dependent */
>> -       int (*host_irq_status) (void __iomem *ioaddr,
>> +       int (*host_irq_status)(struct mac_device_info *hw,
>>                                 struct stmmac_extra_stats *x);
>>         /* Multicast filter setting */
>> -       void (*set_filter) (struct net_device *dev, int id);
>> +       void (*set_filter)(struct mac_device_info *hw,
>> +                           struct net_device *dev);
>>         /* Flow control setting */
>> -       void (*flow_ctrl) (void __iomem *ioaddr, unsigned int duplex,
>> +       void (*flow_ctrl)(struct mac_device_info *hw, unsigned int duplex,
>>                            unsigned int fc, unsigned int pause_time);
>>         /* Set power management mode (e.g. magic frame) */
>> -       void (*pmt) (void __iomem *ioaddr, unsigned long mode);
>> +       void (*pmt)(struct mac_device_info *hw, unsigned long mode);
>>         /* Set/Get Unicast MAC addresses */
>> -       void (*set_umac_addr) (void __iomem *ioaddr, unsigned char *addr,
>> +       void (*set_umac_addr)(struct mac_device_info *hw, unsigned char
>> *addr,
>>                                unsigned int reg_n);
>> -       void (*get_umac_addr) (void __iomem *ioaddr, unsigned char *addr,
>> +       void (*get_umac_addr)(struct mac_device_info *hw, unsigned char
>> *addr,
>>                                unsigned int reg_n);
>> -       void (*set_eee_mode) (void __iomem *ioaddr);
>> -       void (*reset_eee_mode) (void __iomem *ioaddr);
>> -       void (*set_eee_timer) (void __iomem *ioaddr, int ls, int tw);
>> -       void (*set_eee_pls) (void __iomem *ioaddr, int link);
>> -       void (*ctrl_ane) (void __iomem *ioaddr, bool restart);
>> -       void (*get_adv) (void __iomem *ioaddr, struct rgmii_adv *adv);
>> +       void (*set_eee_mode)(struct mac_device_info *hw);
>> +       void (*reset_eee_mode)(struct mac_device_info *hw);
>> +       void (*set_eee_timer)(struct mac_device_info *hw, int ls, int tw);
>> +       void (*set_eee_pls)(struct mac_device_info *hw, int link);
>> +       void (*ctrl_ane)(struct mac_device_info *hw, bool restart);
>> +       void (*get_adv)(struct mac_device_info *hw, struct rgmii_adv
>> *adv);
>>   };
>>
>>   struct stmmac_hwtimestamp {
>> @@ -447,9 +452,14 @@ struct mac_device_info {
>>         struct mii_regs mii;    /* MII register Addresses */
>>         struct mac_link link;
>>         unsigned int synopsys_uid;
>> +       void __iomem *pcsr;     /* vpointer to device CSRs */
>> +       int multicast_filter_bins;
>> +       int unicast_filter_entries;
>> +       int mcast_bits_log2;
>>   };
>>
>> -struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr);
>> +struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr,
>> +                                       int mcbins, int
>> perfect_uc_entries);
>>   struct mac_device_info *dwmac100_setup(void __iomem *ioaddr);
>>
>>   void stmmac_set_mac_addr(void __iomem *ioaddr, u8 addr[6],
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
>> index f37d90f..40b8533 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
>> @@ -87,7 +87,6 @@ enum power_event {
>>                                 (reg * 8))
>>   #define GMAC_ADDR_LOW(reg)    (((reg > 15) ? 0x00000804 : 0x00000044) +
>> \
>>                                 (reg * 8))
>> -#define GMAC_MAX_PERFECT_ADDRESSES     32
>>
>>   /* PCS registers (AN/TBI/SGMII/RGMII) offset */
>>   #define GMAC_AN_CTRL  0x000000c0      /* AN control */
>> @@ -130,6 +129,8 @@ enum power_event {
>>   #define GMAC_CONTROL_2K 0x08000000    /* IEEE 802.3as 2K packets */
>>   #define GMAC_CONTROL_TC       0x01000000      /* Transmit Conf. in
>> RGMII/SGMII */
>>   #define GMAC_CONTROL_WD       0x00800000      /* Disable Watchdog on
>> receive */
>> +
>> +/* GMAC Configuration defines */
>>   #define GMAC_CONTROL_JD       0x00400000      /* Jabber disable */
>>   #define GMAC_CONTROL_BE       0x00200000      /* Frame Burst Enable */
>>   #define GMAC_CONTROL_JE       0x00100000      /* Jumbo frame */
>> @@ -262,5 +263,7 @@ enum rtc_control {
>>   #define GMAC_MMC_TX_INTR   0x108
>>   #define GMAC_MMC_RX_CSUM_OFFLOAD   0x208
>>
>> +#define GMAC_EXTHASH_BASE  0x500
>> +
>>   extern const struct stmmac_dma_ops dwmac1000_dma_ops;
>>   #endif /* __DWMAC1000_H__ */
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>> index b3e148e..44db9fb 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>> @@ -26,14 +26,15 @@
>>     Author: Giuseppe Cavallaro <peppe.cavallaro@st.com>
>>
>> *******************************************************************************/
>>
>> +#include <linux/io.h>
>>   #include <linux/crc32.h>
>> -#include <linux/slab.h>
>>   #include <linux/ethtool.h>
>> -#include <asm/io.h>
>> +#include <linux/slab.h>
>>   #include "dwmac1000.h"
>>
>> -static void dwmac1000_core_init(void __iomem *ioaddr, int mtu)
>> +static void dwmac1000_core_init(struct mac_device_info *hw, int mtu)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         u32 value = readl(ioaddr + GMAC_CONTROL);
>>         value |= GMAC_CORE_INIT;
>>         if (mtu > 1500)
>> @@ -52,8 +53,9 @@ static void dwmac1000_core_init(void __iomem *ioaddr,
>> int mtu)
>>   #endif
>>   }
>>
>> -static int dwmac1000_rx_ipc_enable(void __iomem *ioaddr)
>> +static int dwmac1000_rx_ipc_enable(struct mac_device_info *hw)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         u32 value = readl(ioaddr + GMAC_CONTROL);
>>
>>         value |= GMAC_CONTROL_IPC;
>> @@ -64,8 +66,9 @@ static int dwmac1000_rx_ipc_enable(void __iomem *ioaddr)
>>         return !!(value & GMAC_CONTROL_IPC);
>>   }
>>
>> -static void dwmac1000_dump_regs(void __iomem *ioaddr)
>> +static void dwmac1000_dump_regs(struct mac_device_info *hw)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         int i;
>>         pr_info("\tDWMAC1000 regs (base addr = 0x%p)\n", ioaddr);
>>
>> @@ -76,21 +79,113 @@ static void dwmac1000_dump_regs(void __iomem *ioaddr)
>>         }
>>   }
>>
>> -static void dwmac1000_set_umac_addr(void __iomem *ioaddr, unsigned char
>> *addr,
>> +static void dwmac1000_set_umac_addr(struct mac_device_info *hw,
>> +                                   unsigned char *addr,
>>                                     unsigned int reg_n)
>>   {
>> -       stmmac_set_mac_addr(ioaddr, addr, GMAC_ADDR_HIGH(reg_n),
>> +       stmmac_set_mac_addr(hw->pcsr, addr, GMAC_ADDR_HIGH(reg_n),
>>                             GMAC_ADDR_LOW(reg_n));
>>   }
>>
>> -static void dwmac1000_get_umac_addr(void __iomem *ioaddr, unsigned char
>> *addr,
>> +static void dwmac1000_get_umac_addr(struct mac_device_info *hw,
>> +                                   unsigned char *addr,
>>                                     unsigned int reg_n)
>>   {
>> -       stmmac_get_mac_addr(ioaddr, addr, GMAC_ADDR_HIGH(reg_n),
>> +       stmmac_get_mac_addr(hw->pcsr, addr, GMAC_ADDR_HIGH(reg_n),
>>                             GMAC_ADDR_LOW(reg_n));
>>   }
>>
>> -static void dwmac1000_set_filter(struct net_device *dev, int id)
>> +static void dwmac1000_set_extmchash(void __iomem *ioaddr, u32
>> *mcfilterbits,
>> +                                   int numhashregs)
>> +{
>> +       int regs;
>> +       for (regs = 0; regs < numhashregs; regs++)
>> +               writel(mcfilterbits[regs],
>> +                      ioaddr + GMAC_EXTHASH_BASE + regs * 4);
>> +}
>> +
>> +static void dwmac1000_set_filterex(struct mac_device_info *hw,
>> +                                  struct net_device *dev)
>> +{
>> +       void __iomem *ioaddr = (void __iomem *)dev->base_addr;
>> +       unsigned int value = 0;
>> +       unsigned int perfect_addr_number;
>> +       u32 mc_filter[8];
>> +
>> +       pr_debug("%s: # mcasts %d, # unicast %d\n", __func__,
>> +                netdev_mc_count(dev), netdev_uc_count(dev));
>> +
>> +       if (dev->flags & IFF_PROMISC) {
>> +               value = GMAC_FRAME_FILTER_PR;
>> +       } else if (dev->flags & IFF_ALLMULTI) {
>> +               value = GMAC_FRAME_FILTER_PM;   /* pass all multi */
>> +       } else if (!netdev_mc_empty(dev)) {
>> +               struct netdev_hw_addr *ha;
>> +
>> +               memset(mc_filter, 0, sizeof(mc_filter));
>> +
>> +               /* Hash filter for multicast */
>> +               value = GMAC_FRAME_FILTER_HMC;
>> +
>> +               netdev_for_each_mc_addr(ha, dev) {
>> +                       /* The upper n bits of the calculated CRC are used
>> to
>> +                        * index the contents of the hash table depending
>> +                        * on the particular core's multicast hash size
>> +                        * configured through Synopsys Core Consultant
>> +                        */
>> +                       int bit_nr = bitrev32(~crc32_le(~0, ha->addr,
>> +                                             ETH_ALEN)) >>
>> +                                            (32 - hw->mcast_bits_log2);
>> +
>> +                       /* The most significant bit determines the
>> register to
>> +                        * use (H/L) while the other 5 bits determine the
>> bit
>> +                        * within the register.
>> +                        */
>> +                       mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31);
>> +               }
>> +               if (hw->mcast_bits_log2 == 6) {
>> +                       writel(mc_filter[0], ioaddr + GMAC_HASH_LOW);
>> +                       writel(mc_filter[1], ioaddr + GMAC_HASH_HIGH);
>> +               } else if (hw->mcast_bits_log2 == 7) {
>> +                       dwmac1000_set_extmchash(ioaddr, mc_filter, 4);
>> +               } else if (hw->mcast_bits_log2 == 8) {
>> +                       dwmac1000_set_extmchash(ioaddr, mc_filter, 8);
>> +               } else {
>> +                       pr_debug("STMMAC: err in setting mulitcast
>> filter\n");
>> +               }
>> +       }
>> +
>> +       /* Extra 16 regs are available in cores newer than the 3.40. */
>> +       if (hw->synopsys_uid > DWMAC_CORE_3_40)
>> +               perfect_addr_number = hw->unicast_filter_entries;
>> +       else
>> +               perfect_addr_number = hw->unicast_filter_entries / 2;
>
>
> can you check if this is safe enough? I mean if we pass a setting that
> could generate problems in case of old Synopsys chips.

I was going by the existing code and what I could get out of the
Synopsys 10/100/1000 DwMAC Databook, but will check with Synopsys to
verify what I believe to be true. I posted a support request to
Synopsys asking for the best way to know about the number of available
unicast filter entries by the core version number and copied you on
the request. I'll be sure to forward any valuable technical
information from the conversation, and update the code accordingly
with my next submission.

>
>
>> +
>> +       /* Handle multiple unicast addresses (perfect filtering) */
>> +       if (netdev_uc_count(dev) > perfect_addr_number)
>> +               /* Switch to promiscuous mode if more than 16 addrs
>> +                * are required
>> +                */
>> +               value |= GMAC_FRAME_FILTER_PR;
>> +       else {
>> +               int reg = 1;
>> +               struct netdev_hw_addr *ha;
>> +
>> +               netdev_for_each_uc_addr(ha, dev) {
>> +                       dwmac1000_set_umac_addr(hw, ha->addr, reg);
>> +                       reg++;
>> +               }
>> +       }
>> +
>> +#ifdef FRAME_FILTER_DEBUG
>> +       /* Enable Receive all mode (to debug filtering_fail errors) */
>> +       value |= GMAC_FRAME_FILTER_RA;
>> +#endif
>> +       writel(value, ioaddr + GMAC_FRAME_FILTER);
>> +}
>> +
>> +static void dwmac1000_set_filter(struct mac_device_info *hw,
>> +                                struct net_device *dev)
>>   {
>>         void __iomem *ioaddr = (void __iomem *)dev->base_addr;
>>         unsigned int value = 0;
>> @@ -130,7 +225,7 @@ static void dwmac1000_set_filter(struct net_device
>> *dev, int id)
>>         }
>>
>>         /* Extra 16 regs are available in cores newer than the 3.40. */
>> -       if (id > DWMAC_CORE_3_40)
>> +       if (hw->synopsys_uid > DWMAC_CORE_3_40)
>>                 perfect_addr_number = GMAC_MAX_PERFECT_ADDRESSES;
>>         else
>>                 perfect_addr_number = GMAC_MAX_PERFECT_ADDRESSES / 2;
>> @@ -146,7 +241,7 @@ static void dwmac1000_set_filter(struct net_device
>> *dev, int id)
>>                 struct netdev_hw_addr *ha;
>>
>>                 netdev_for_each_uc_addr(ha, dev) {
>> -                       dwmac1000_set_umac_addr(ioaddr, ha->addr, reg);
>> +                       dwmac1000_set_umac_addr(hw, ha->addr, reg);
>>                         reg++;
>>                 }
>>         }
>> @@ -162,9 +257,10 @@ static void dwmac1000_set_filter(struct net_device
>> *dev, int id)
>>                  readl(ioaddr + GMAC_HASH_HIGH), readl(ioaddr +
>> GMAC_HASH_LOW));
>>   }
>>
>> -static void dwmac1000_flow_ctrl(void __iomem *ioaddr, unsigned int
>> duplex,
>> +static void dwmac1000_flow_ctrl(struct mac_device_info *hw, unsigned int
>> duplex,
>>                                 unsigned int fc, unsigned int pause_time)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         unsigned int flow = 0;
>>
>>         pr_debug("GMAC Flow-Control:\n");
>> @@ -185,8 +281,9 @@ static void dwmac1000_flow_ctrl(void __iomem *ioaddr,
>> unsigned int duplex,
>>         writel(flow, ioaddr + GMAC_FLOW_CTRL);
>>   }
>>
>> -static void dwmac1000_pmt(void __iomem *ioaddr, unsigned long mode)
>> +static void dwmac1000_pmt(struct mac_device_info *hw, unsigned long mode)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         unsigned int pmt = 0;
>>
>>         if (mode & WAKE_MAGIC) {
>> @@ -201,9 +298,10 @@ static void dwmac1000_pmt(void __iomem *ioaddr,
>> unsigned long mode)
>>         writel(pmt, ioaddr + GMAC_PMT);
>>   }
>>
>> -static int dwmac1000_irq_status(void __iomem *ioaddr,
>> +static int dwmac1000_irq_status(struct mac_device_info *hw,
>>                                 struct stmmac_extra_stats *x)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         u32 intr_status = readl(ioaddr + GMAC_INT_STATUS);
>>         int ret = 0;
>>
>> @@ -268,8 +366,9 @@ static int dwmac1000_irq_status(void __iomem *ioaddr,
>>         return ret;
>>   }
>>
>> -static void dwmac1000_set_eee_mode(void __iomem *ioaddr)
>> +static void dwmac1000_set_eee_mode(struct mac_device_info *hw)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         u32 value;
>>
>>         /* Enable the link status receive on RGMII, SGMII ore SMII
>> @@ -281,8 +380,9 @@ static void dwmac1000_set_eee_mode(void __iomem
>> *ioaddr)
>>         writel(value, ioaddr + LPI_CTRL_STATUS);
>>   }
>>
>> -static void dwmac1000_reset_eee_mode(void __iomem *ioaddr)
>> +static void dwmac1000_reset_eee_mode(struct mac_device_info *hw)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         u32 value;
>>
>>         value = readl(ioaddr + LPI_CTRL_STATUS);
>> @@ -290,8 +390,9 @@ static void dwmac1000_reset_eee_mode(void __iomem
>> *ioaddr)
>>         writel(value, ioaddr + LPI_CTRL_STATUS);
>>   }
>>
>> -static void dwmac1000_set_eee_pls(void __iomem *ioaddr, int link)
>> +static void dwmac1000_set_eee_pls(struct mac_device_info *hw, int link)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         u32 value;
>>
>>         value = readl(ioaddr + LPI_CTRL_STATUS);
>> @@ -304,8 +405,9 @@ static void dwmac1000_set_eee_pls(void __iomem
>> *ioaddr, int link)
>>         writel(value, ioaddr + LPI_CTRL_STATUS);
>>   }
>>
>> -static void dwmac1000_set_eee_timer(void __iomem *ioaddr, int ls, int tw)
>> +static void dwmac1000_set_eee_timer(struct mac_device_info *hw, int ls,
>> int tw)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         int value = ((tw & 0xffff)) | ((ls & 0x7ff) << 16);
>>
>>         /* Program the timers in the LPI timer control register:
>> @@ -318,8 +420,9 @@ static void dwmac1000_set_eee_timer(void __iomem
>> *ioaddr, int ls, int tw)
>>         writel(value, ioaddr + LPI_TIMER_CTRL);
>>   }
>>
>> -static void dwmac1000_ctrl_ane(void __iomem *ioaddr, bool restart)
>> +static void dwmac1000_ctrl_ane(struct mac_device_info *hw, bool restart)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         u32 value;
>>
>>         value = readl(ioaddr + GMAC_AN_CTRL);
>> @@ -332,8 +435,9 @@ static void dwmac1000_ctrl_ane(void __iomem *ioaddr,
>> bool restart)
>>         writel(value, ioaddr + GMAC_AN_CTRL);
>>   }
>>
>> -static void dwmac1000_get_adv(void __iomem *ioaddr, struct rgmii_adv
>> *adv)
>> +static void dwmac1000_get_adv(struct mac_device_info *hw, struct
>> rgmii_adv *adv)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         u32 value = readl(ioaddr + GMAC_ANE_ADV);
>>
>>         if (value & GMAC_ANE_FD)
>> @@ -353,7 +457,7 @@ static void dwmac1000_get_adv(void __iomem *ioaddr,
>> struct rgmii_adv *adv)
>>         adv->lp_pause = (value & GMAC_ANE_PSE) >> GMAC_ANE_PSE_SHIFT;
>>   }
>>
>> -static const struct stmmac_ops dwmac1000_ops = {
>> +static struct stmmac_ops dwmac1000_ops = {
>>         .core_init = dwmac1000_core_init,
>>         .rx_ipc = dwmac1000_rx_ipc_enable,
>>         .dump_regs = dwmac1000_dump_regs,
>> @@ -371,7 +475,8 @@ static const struct stmmac_ops dwmac1000_ops = {
>>         .get_adv = dwmac1000_get_adv,
>>   };
>>
>> -struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr)
>> +struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr, int mcbins,
>> +                                       int perfect_uc_entries)
>>   {
>>         struct mac_device_info *mac;
>>         u32 hwid = readl(ioaddr + GMAC_VERSION);
>> @@ -380,6 +485,16 @@ struct mac_device_info *dwmac1000_setup(void __iomem
>> *ioaddr)
>>         if (!mac)
>>                 return NULL;
>>
>> +       mac->pcsr = ioaddr;
>> +       mac->multicast_filter_bins = mcbins;
>> +       mac->unicast_filter_entries = perfect_uc_entries;
>> +       mac->mcast_bits_log2 = 0;
>> +       if (mac->multicast_filter_bins)
>> +               mac->mcast_bits_log2 = ilog2(mac->multicast_filter_bins);
>> +
>> +       if (mac->mcast_bits_log2 != 64)
>> +               dwmac1000_ops.set_filter = dwmac1000_set_filterex;
>> +
>>         mac->mac = &dwmac1000_ops;
>>         mac->dma = &dwmac1000_dma_ops;
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
>> index 2ff767b..3ee3ab5 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
>> @@ -28,12 +28,13 @@
>>     Author: Giuseppe Cavallaro <peppe.cavallaro@st.com>
>>
>> *******************************************************************************/
>>
>> -#include <linux/crc32.h>
>>   #include <asm/io.h>
>> +#include <linux/crc32.h>
>>   #include "dwmac100.h"
>>
>> -static void dwmac100_core_init(void __iomem *ioaddr, int mtu)
>> +static void dwmac100_core_init(struct mac_device_info *hw, int mtu)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         u32 value = readl(ioaddr + MAC_CONTROL);
>>
>>         writel((value | MAC_CORE_INIT), ioaddr + MAC_CONTROL);
>> @@ -43,8 +44,9 @@ static void dwmac100_core_init(void __iomem *ioaddr, int
>> mtu)
>>   #endif
>>   }
>>
>> -static void dwmac100_dump_mac_regs(void __iomem *ioaddr)
>> +static void dwmac100_dump_mac_regs(struct mac_device_info *hw)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         pr_info("\t----------------------------------------------\n"
>>                 "\t  DWMAC 100 CSR (base addr = 0x%p)\n"
>>                 "\t----------------------------------------------\n",
>> ioaddr);
>> @@ -66,30 +68,35 @@ static void dwmac100_dump_mac_regs(void __iomem
>> *ioaddr)
>>                 readl(ioaddr + MAC_VLAN2));
>>   }
>>
>> -static int dwmac100_rx_ipc_enable(void __iomem *ioaddr)
>> +static int dwmac100_rx_ipc_enable(struct mac_device_info *hw)
>>   {
>>         return 0;
>>   }
>>
>> -static int dwmac100_irq_status(void __iomem *ioaddr,
>> +static int dwmac100_irq_status(struct mac_device_info *hw,
>>                                struct stmmac_extra_stats *x)
>>   {
>>         return 0;
>>   }
>>
>> -static void dwmac100_set_umac_addr(void __iomem *ioaddr, unsigned char
>> *addr,
>> +static void dwmac100_set_umac_addr(struct mac_device_info *hw,
>> +                                  unsigned char *addr,
>>                                    unsigned int reg_n)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         stmmac_set_mac_addr(ioaddr, addr, MAC_ADDR_HIGH, MAC_ADDR_LOW);
>>   }
>>
>> -static void dwmac100_get_umac_addr(void __iomem *ioaddr, unsigned char
>> *addr,
>> +static void dwmac100_get_umac_addr(struct mac_device_info *hw,
>> +                                  unsigned char *addr,
>>                                    unsigned int reg_n)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         stmmac_get_mac_addr(ioaddr, addr, MAC_ADDR_HIGH, MAC_ADDR_LOW);
>>   }
>>
>> -static void dwmac100_set_filter(struct net_device *dev, int id)
>> +static void dwmac100_set_filter(struct mac_device_info *hw,
>> +                               struct net_device *dev)
>>   {
>>         void __iomem *ioaddr = (void __iomem *)dev->base_addr;
>>         u32 value = readl(ioaddr + MAC_CONTROL);
>> @@ -137,9 +144,10 @@ static void dwmac100_set_filter(struct net_device
>> *dev, int id)
>>         writel(value, ioaddr + MAC_CONTROL);
>>   }
>>
>> -static void dwmac100_flow_ctrl(void __iomem *ioaddr, unsigned int duplex,
>> +static void dwmac100_flow_ctrl(struct mac_device_info *hw, unsigned int
>> duplex,
>>                                unsigned int fc, unsigned int pause_time)
>>   {
>> +       void __iomem *ioaddr = hw->pcsr;
>>         unsigned int flow = MAC_FLOW_CTRL_ENABLE;
>>
>>         if (duplex)
>> @@ -148,7 +156,7 @@ static void dwmac100_flow_ctrl(void __iomem *ioaddr,
>> unsigned int duplex,
>>   }
>>
>>   /* No PMT module supported on ST boards with this Eth chip. */
>> -static void dwmac100_pmt(void __iomem *ioaddr, unsigned long mode)
>> +static void dwmac100_pmt(struct mac_device_info *hw, unsigned long mode)
>>   {
>>         return;
>>   }
>> @@ -175,6 +183,7 @@ struct mac_device_info *dwmac100_setup(void __iomem
>> *ioaddr)
>>
>>         pr_info("\tDWMAC100\n");
>>
>> +       mac->pcsr = ioaddr;
>>         mac->mac = &dwmac100_ops;
>>         mac->dma = &dwmac100_dma_ops;
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
>> b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
>> index c5f9cb8..e679fa6 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
>> @@ -262,7 +262,7 @@ static int stmmac_ethtool_getsettings(struct
>> net_device *dev,
>>
>>                 /* Get and convert ADV/LP_ADV from the HW AN registers */
>>                 if (priv->hw->mac->get_adv)
>> -                       priv->hw->mac->get_adv(priv->ioaddr, &adv);
>> +                       priv->hw->mac->get_adv(priv->hw, &adv);
>>                 else
>>                         return -EOPNOTSUPP;     /* should never happen
>> indeed */
>>
>> @@ -352,7 +352,7 @@ static int stmmac_ethtool_setsettings(struct
>> net_device *dev,
>>
>>                         spin_lock(&priv->lock);
>>                         if (priv->hw->mac->ctrl_ane)
>> -                               priv->hw->mac->ctrl_ane(priv->ioaddr, 1);
>> +                               priv->hw->mac->ctrl_ane(priv->hw, 1);
>>                         spin_unlock(&priv->lock);
>>                 }
>>
>> @@ -471,7 +471,7 @@ stmmac_set_pauseparam(struct net_device *netdev,
>>                 if (netif_running(netdev))
>>                         ret = phy_start_aneg(phy);
>>         } else
>> -               priv->hw->mac->flow_ctrl(priv->ioaddr, phy->duplex,
>> +               priv->hw->mac->flow_ctrl(priv->hw, phy->duplex,
>>                                          priv->flow_ctrl, priv->pause);
>>         spin_unlock(&priv->lock);
>>         return ret;
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> index d93aa87..aaa14b2 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> @@ -233,7 +233,7 @@ static void stmmac_enable_eee_mode(struct stmmac_priv
>> *priv)
>>         /* Check and enter in LPI mode */
>>         if ((priv->dirty_tx == priv->cur_tx) &&
>>             (priv->tx_path_in_lpi_mode == false))
>> -               priv->hw->mac->set_eee_mode(priv->ioaddr);
>> +               priv->hw->mac->set_eee_mode(priv->hw);
>>   }
>>
>>   /**
>> @@ -244,7 +244,7 @@ static void stmmac_enable_eee_mode(struct stmmac_priv
>> *priv)
>>    */
>>   void stmmac_disable_eee_mode(struct stmmac_priv *priv)
>>   {
>> -       priv->hw->mac->reset_eee_mode(priv->ioaddr);
>> +       priv->hw->mac->reset_eee_mode(priv->hw);
>>         del_timer_sync(&priv->eee_ctrl_timer);
>>         priv->tx_path_in_lpi_mode = false;
>>   }
>> @@ -298,12 +298,12 @@ bool stmmac_eee_init(struct stmmac_priv *priv)
>>                         priv->eee_ctrl_timer.expires =
>> STMMAC_LPI_T(eee_timer);
>>                         add_timer(&priv->eee_ctrl_timer);
>>
>> -                       priv->hw->mac->set_eee_timer(priv->ioaddr,
>> +                       priv->hw->mac->set_eee_timer(priv->hw,
>>
>> STMMAC_DEFAULT_LIT_LS,
>>                                                      priv->tx_lpi_timer);
>>                 } else
>>                         /* Set HW EEE according to the speed */
>> -                       priv->hw->mac->set_eee_pls(priv->ioaddr,
>> +                       priv->hw->mac->set_eee_pls(priv->hw,
>>                                                    priv->phydev->link);
>>
>>                 pr_info("stmmac: Energy-Efficient Ethernet
>> initialized\n");
>> @@ -678,7 +678,7 @@ static void stmmac_adjust_link(struct net_device *dev)
>>                 }
>>                 /* Flow Control operation */
>>                 if (phydev->pause)
>> -                       priv->hw->mac->flow_ctrl(priv->ioaddr,
>> phydev->duplex,
>> +                       priv->hw->mac->flow_ctrl(priv->hw, phydev->duplex,
>>                                                  fc, pause_time);
>>
>>                 if (phydev->speed != priv->speed) {
>> @@ -1519,8 +1519,7 @@ static int stmmac_get_hw_features(struct stmmac_priv
>> *priv)
>>   static void stmmac_check_ether_addr(struct stmmac_priv *priv)
>>   {
>>         if (!is_valid_ether_addr(priv->dev->dev_addr)) {
>> -               priv->hw->mac->get_umac_addr((void __iomem *)
>> -                                            priv->dev->base_addr,
>> +               priv->hw->mac->get_umac_addr(priv->hw,
>>                                              priv->dev->dev_addr, 0);
>>                 if (!is_valid_ether_addr(priv->dev->dev_addr))
>>                         eth_hw_addr_random(priv->dev);
>> @@ -1617,14 +1616,14 @@ static int stmmac_hw_setup(struct net_device *dev)
>>         }
>>
>>         /* Copy the MAC addr into the HW  */
>> -       priv->hw->mac->set_umac_addr(priv->ioaddr, dev->dev_addr, 0);
>> +       priv->hw->mac->set_umac_addr(priv->hw, dev->dev_addr, 0);
>>
>>         /* If required, perform hw setup of the bus. */
>>         if (priv->plat->bus_setup)
>>                 priv->plat->bus_setup(priv->ioaddr);
>>
>>         /* Initialize the MAC Core */
>> -       priv->hw->mac->core_init(priv->ioaddr, dev->mtu);
>> +       priv->hw->mac->core_init(priv->hw, dev->mtu);
>>
>>         /* Enable the MAC Rx/Tx */
>>         stmmac_set_mac(priv->ioaddr, true);
>> @@ -1650,7 +1649,7 @@ static int stmmac_hw_setup(struct net_device *dev)
>>
>>         /* Dump DMA/MAC registers */
>>         if (netif_msg_hw(priv)) {
>> -               priv->hw->mac->dump_regs(priv->ioaddr);
>> +               priv->hw->mac->dump_regs(priv->hw);
>>                 priv->hw->dma->dump_regs(priv->ioaddr);
>>         }
>>         priv->tx_lpi_timer = STMMAC_DEFAULT_TWT_LS;
>> @@ -1665,7 +1664,7 @@ static int stmmac_hw_setup(struct net_device *dev)
>>         }
>>
>>         if (priv->pcs && priv->hw->mac->ctrl_ane)
>> -               priv->hw->mac->ctrl_ane(priv->ioaddr, 0);
>> +               priv->hw->mac->ctrl_ane(priv->hw, 0);
>>
>>         return 0;
>>   }
>> @@ -2244,7 +2243,7 @@ static void stmmac_set_rx_mode(struct net_device
>> *dev)
>>         struct stmmac_priv *priv = netdev_priv(dev);
>>
>>         spin_lock(&priv->lock);
>> -       priv->hw->mac->set_filter(dev, priv->synopsys_id);
>> +       priv->hw->mac->set_filter(priv->hw, dev);
>>         spin_unlock(&priv->lock);
>>   }
>>
>> @@ -2334,8 +2333,7 @@ static irqreturn_t stmmac_interrupt(int irq, void
>> *dev_id)
>>
>>         /* To handle GMAC own interrupts */
>>         if (priv->plat->has_gmac) {
>> -               int status = priv->hw->mac->host_irq_status((void __iomem
>> *)
>> -
>> dev->base_addr,
>> +               int status = priv->hw->mac->host_irq_status(priv->hw,
>>
>> &priv->xstats);
>>                 if (unlikely(status)) {
>>                         /* For LPI we need to save the tx status */
>> @@ -2619,7 +2617,9 @@ static int stmmac_hw_init(struct stmmac_priv *priv)
>>         /* Identify the MAC HW device */
>>         if (priv->plat->has_gmac) {
>>                 priv->dev->priv_flags |= IFF_UNICAST_FLT;
>> -               mac = dwmac1000_setup(priv->ioaddr);
>> +               mac = dwmac1000_setup(priv->ioaddr,
>> +                                     priv->plat->multicast_filter_bins,
>> +                                     priv->plat->unicast_filter_entries);
>>         } else {
>>                 mac = dwmac100_setup(priv->ioaddr);
>>         }
>> @@ -2668,7 +2668,7 @@ static int stmmac_hw_init(struct stmmac_priv *priv)
>>         /* To use alternate (extended) or normal descriptor structures */
>>         stmmac_selec_desc_mode(priv);
>>
>> -       ret = priv->hw->mac->rx_ipc(priv->ioaddr);
>> +       ret = priv->hw->mac->rx_ipc(priv->hw);
>>         if (!ret) {
>>                 pr_warn(" RX IPC Checksum Offload not configured.\n");
>>                 priv->plat->rx_coe = STMMAC_RX_COE_NONE;
>> @@ -2888,7 +2888,7 @@ int stmmac_suspend(struct net_device *ndev)
>>
>>         /* Enable Power down mode by programming the PMT regs */
>>         if (device_may_wakeup(priv->device)) {
>> -               priv->hw->mac->pmt(priv->ioaddr, priv->wolopts);
>> +               priv->hw->mac->pmt(priv->hw, priv->wolopts);
>>                 priv->irq_wake = 1;
>>         } else {
>>                 stmmac_set_mac(priv->ioaddr, false);
>> @@ -2917,7 +2917,7 @@ int stmmac_resume(struct net_device *ndev)
>>          * from another devices (e.g. serial console).
>>          */
>>         if (device_may_wakeup(priv->device)) {
>> -               priv->hw->mac->pmt(priv->ioaddr, 0);
>> +               priv->hw->mac->pmt(priv->hw, 0);
>>                 priv->irq_wake = 0;
>>         } else {
>>                 pinctrl_pm_select_default_state(priv->device);
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> index 5884a7d..4502cde 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> @@ -43,6 +43,42 @@ static const struct of_device_id stmmac_dt_ids[] = {
>>   };
>>   MODULE_DEVICE_TABLE(of, stmmac_dt_ids);
>>
>> +
>> +static int stmmac_validate_mcast_bins(int mcast_bins)
>> +{
>> +       int x = mcast_bins;
>> +       switch (x) {
>> +       case 0:
>> +       case HASH_TABLE_SIZE:
>> +       case 128:
>> +       case 256:
>> +               break;
>> +       default:
>> +               x = HASH_TABLE_SIZE;
>> +               pr_info("Hash table entries set to unexpected value %d",
>> +                       mcast_bins);
>> +               break;
>> +       }
>> +       return x;
>> +}
>> +
>> +static int stmmac_validate_ucast_entries(int ucast_entries)
>> +{
>> +       int x = ucast_entries;
>> +       switch (x) {
>> +       case 32:
>> +       case 64:
>> +       case 128:
>> +               break;
>> +       default:
>> +               x = 32;
>> +               pr_info("Unicast table entries set to unexpected value
>> %d\n",
>> +                       ucast_entries);
>> +               break;
>> +       }
>> +       return x;
>> +}
>> +
>>   #ifdef CONFIG_OF
>>   static int stmmac_probe_config_dt(struct platform_device *pdev,
>>                                   struct plat_stmmacenet_data *plat,
>> @@ -107,6 +143,13 @@ static int stmmac_probe_config_dt(struct
>> platform_device *pdev,
>>          */
>>         plat->maxmtu = JUMBO_LEN;
>>
>> +       /* Set default value for multicast hash bins */
>> +       plat->multicast_filter_bins = HASH_TABLE_SIZE;
>> +
>> +       /* Set default value for unicast filter entries */
>> +       plat->unicast_filter_entries = GMAC_MAX_PERFECT_ADDRESSES;
>> +
>> +
>>         /*
>>          * Currently only the properties needed on SPEAr600
>>          * are provided. All other properties should be added
>> @@ -123,6 +166,14 @@ static int stmmac_probe_config_dt(struct
>> platform_device *pdev,
>>                  * are clearly MTUs
>>                  */
>>                 of_property_read_u32(np, "max-frame-size", &plat->maxmtu);
>> +               of_property_read_u32(np, "snps,multicast-filter-bins",
>> +                                    &plat->multicast_filter_bins);
>> +               of_property_read_u32(np, "snps,perfect-filter-entries",
>> +                                    &plat->unicast_filter_entries);
>> +               plat->unicast_filter_entries =
>> stmmac_validate_ucast_entries(
>> +                                    plat->unicast_filter_entries);
>> +               plat->multicast_filter_bins = stmmac_validate_mcast_bins(
>> +                                    plat->multicast_filter_bins);
>
>
>
> Can this validation be done in main?

Yes, I will make this change for the next submission, along with any
comprehending any technical information from Synopsys on the unicast
filter.

>
>
>>                 plat->has_gmac = 1;
>>                 plat->pmt = 1;
>>         }
>> diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
>> index 6f27d4f..cd63851 100644
>> --- a/include/linux/stmmac.h
>> +++ b/include/linux/stmmac.h
>> @@ -112,6 +112,8 @@ struct plat_stmmacenet_data {
>>         int riwt_off;
>>         int max_speed;
>>         int maxmtu;
>> +       int multicast_filter_bins;
>> +       int unicast_filter_entries;
>>         void (*fix_mac_speed)(void *priv, unsigned int speed);
>>         void (*bus_setup)(void __iomem *ioaddr);
>>         void *(*setup)(struct platform_device *pdev);
>>
>

Thanks,

ATB!

Vince

^ permalink raw reply

* Re: [PATCH V3] net/dt: Add support for overriding phy configuration from device tree
From: Grant Likely @ 2014-02-04 17:15 UTC (permalink / raw)
  To: Matthew Garrett, netdev; +Cc: devicetree, linux-kernel, kishon, Matthew Garrett
In-Reply-To: <1389999459-9483-1-git-send-email-matthew.garrett@nebula.com>

On Fri, 17 Jan 2014 17:57:39 -0500, Matthew Garrett <matthew.garrett@nebula.com> wrote:
> Some hardware may be broken in interesting and board-specific ways, such
> that various bits of functionality don't work. This patch provides a
> mechanism for overriding mii registers during init based on the contents of
> the device tree data, allowing board-specific fixups without having to
> pollute generic code.
> 
> Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>

I've got no problems in principle with the feature. I think it is a good
change. The binding itself feels too verbose to me. As an alternative,
how about the following:

	phy-mii-advertise = "10half", "!10full", "!100base4"

So it is a list of features. Prepending a '!' makes it disabled. And we
could add something like "nodefault" to clear the probed settings
entirely before listing the modes we want.

g.

> ---
> 
> V3: Break the main function out into some helper functions and store the
> values in some structs.
> 
>  Documentation/devicetree/bindings/net/phy.txt | 21 +++++++
>  drivers/net/phy/phy_device.c                  | 29 ++++++++-
>  drivers/of/of_net.c                           | 87 +++++++++++++++++++++++++++
>  include/linux/of_net.h                        | 12 ++++
>  4 files changed, 148 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt
> index 7cd18fb..fc50f02 100644
> --- a/Documentation/devicetree/bindings/net/phy.txt
> +++ b/Documentation/devicetree/bindings/net/phy.txt
> @@ -23,6 +23,21 @@ Optional Properties:
>    assume clause 22. The compatible list may also contain other
>    elements.
>  
> +The following properties may be added to either the phy node or the parent
> +ethernet device. If not present, the hardware defaults will be used.
> +
> +- phy-mii-advertise-10half: 1 to advertise half-duplex 10MBit, 0 to disable
> +- phy-mii-advertise-10full: 1 to advertise full-duplex 10MBit, 0 to disable
> +- phy-mii-advertise-100half: 1 to advertise half-duplex 100MBit, 0 to disable
> +- phy-mii-advertise-100full: 1 to advertise full-duplex 100MBit, 0 to disable
> +- phy-mii-advertise-100base4: 1 to advertise 100base4, 0 to disable
> +- phy-mii-advertise-1000half: 1 to advertise half-duplex 1000MBit, 0 to disable
> +- phy-mii-advertise-1000full: 1 to advertise full-duplex 1000MBit, 0 to disable
> +- phy-mii-manual-master: 1 to enable manual master/slave configuration, 0
> +  to disable manual master/slave configuration
> +- phy-mii-as-master: 1 to configure phy to act as master, 0 to configure phy
> +  to act as slave. Ignored if manual master/slave configuration is not enabled.
> +
>  Example:
>  
>  ethernet-phy@0 {
> @@ -32,4 +47,10 @@ ethernet-phy@0 {
>  	interrupts = <35 1>;
>  	reg = <0>;
>  	device_type = "ethernet-phy";
> +	// Disable advertising of full duplex 1000MBit
> +	phy-mii-advertise-1000full = <0>;
> +	// Force manual phy master/slave configuration
> +	phy-mii-manual-master = <1>;
> +	// Forcibly configure phy as slave
> +	phy-mii-as-master = <0>;
>  };
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index d6447b3..91793bc 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -33,6 +33,7 @@
>  #include <linux/mii.h>
>  #include <linux/ethtool.h>
>  #include <linux/phy.h>
> +#include <linux/of_net.h>
>  
>  #include <asm/io.h>
>  #include <asm/irq.h>
> @@ -497,6 +498,28 @@ void phy_disconnect(struct phy_device *phydev)
>  }
>  EXPORT_SYMBOL(phy_disconnect);
>  
> +int phy_override_from_of(struct phy_device *phydev)
> +{
> +	int reg, regval;
> +	u16 val, mask;
> +
> +	/* Check for phy register overrides from OF */
> +	for (reg = 0; reg < 16; reg++) {
> +		if (!of_get_mii_register(phydev, reg, &val, &mask)) {
> +			if (!mask)
> +				continue;
> +			regval = phy_read(phydev, reg);
> +			if (regval < 0)
> +				continue;
> +			regval &= ~mask;
> +			regval |= val;
> +			phy_write(phydev, reg, regval);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  int phy_init_hw(struct phy_device *phydev)
>  {
>  	int ret;
> @@ -508,7 +531,11 @@ int phy_init_hw(struct phy_device *phydev)
>  	if (ret < 0)
>  		return ret;
>  
> -	return phydev->drv->config_init(phydev);
> +	ret = phydev->drv->config_init(phydev);
> +	if (ret < 0)
> +		return ret;
> +
> +	return phy_override_from_of(phydev);
>  }
>  
>  /**
> diff --git a/drivers/of/of_net.c b/drivers/of/of_net.c
> index 8f9be2e..75751b7 100644
> --- a/drivers/of/of_net.c
> +++ b/drivers/of/of_net.c
> @@ -93,3 +93,90 @@ const void *of_get_mac_address(struct device_node *np)
>  	return NULL;
>  }
>  EXPORT_SYMBOL(of_get_mac_address);
> +
> +struct mii_override {
> +	char *prop;
> +	u32 supported;
> +	u16 value;
> +};
> +
> +static struct mii_override mii_advertise_override[] = {
> +	{ "phy-mii-advertise-10half", SUPPORTED_10baseT_Half,
> +	  ADVERTISE_10HALF },
> +	{ "phy-mii-advertise-10full", SUPPORTED_10baseT_Full,
> +	  ADVERTISE_10FULL },
> +	{ "phy-mii-advertise-100half", SUPPORTED_100baseT_Half,
> +	  ADVERTISE_100HALF },
> +	{ "phy-mii-advertise-100full", SUPPORTED_100baseT_Full,
> +	  ADVERTISE_100FULL },
> +	{ "phy-mii-advertise-100base4", 0, ADVERTISE_100BASE4 },
> +	{ NULL },
> +};
> +
> +static struct mii_override mii_gigabit_override[] = {
> +	{ "phy-mii-advertise-1000half", SUPPORTED_1000baseT_Half,
> +	  ADVERTISE_1000HALF },
> +	{ "phy-mii-advertise-1000full", SUPPORTED_1000baseT_Full,
> +	  ADVERTISE_1000FULL },
> +	{ "phy-mii-as-master", 0, CTL1000_AS_MASTER },
> +	{ "phy-mii-manual-master", 0, CTL1000_ENABLE_MASTER },
> +	{ NULL },
> +};
> +
> +static void mii_handle_override(struct mii_override *override_list,
> +				struct phy_device *phydev, u16 *val, u16 *mask)
> +{
> +	struct device *dev = &phydev->dev;
> +	struct device_node *np = dev->of_node;
> +	struct mii_override *override;
> +	u32 tmp;
> +
> +	if (!np && dev->parent->of_node)
> +		np = dev->parent->of_node;
> +
> +	if (!np)
> +		return;
> +
> +	for (override = &override_list[0]; override->prop != NULL; override++) {
> +		if (!of_property_read_u32(np, override->prop, &tmp)) {
> +			if (tmp) {
> +				*val |= override->value;
> +				phydev->advertising |= override->supported;
> +			} else {
> +				phydev->advertising &= ~(override->supported);
> +			}
> +
> +			*mask |= override->value;
> +		}
> +	}
> +
> +	return;
> +}
> +
> +/**
> + * Provide phy register overrides from the device tree. Some hardware may
> + * be broken in interesting and board-specific ways, so we want a mechanism
> + * for the board data to provide overrides for default values. This should be
> + * called during phy init.
> + */
> +int of_get_mii_register(struct phy_device *phydev, int reg, u16 *val,
> +			u16 *mask)
> +{
> +	*val = 0;
> +	*mask = 0;
> +
> +	switch (reg) {
> +	case MII_ADVERTISE:
> +		mii_handle_override(mii_advertise_override, phydev, val, mask);
> +		break;
> +
> +	case MII_CTRL1000:
> +		mii_handle_override(mii_gigabit_override, phydev, val, mask);
> +		break;
> +
> +	default:
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL(of_get_mii_register);
> diff --git a/include/linux/of_net.h b/include/linux/of_net.h
> index 34597c8..2e478bc 100644
> --- a/include/linux/of_net.h
> +++ b/include/linux/of_net.h
> @@ -7,10 +7,14 @@
>  #ifndef __LINUX_OF_NET_H
>  #define __LINUX_OF_NET_H
>  
> +#include <linux/phy.h>
> +
>  #ifdef CONFIG_OF_NET
>  #include <linux/of.h>
>  extern int of_get_phy_mode(struct device_node *np);
>  extern const void *of_get_mac_address(struct device_node *np);
> +extern int of_get_mii_register(struct phy_device *np, int reg, u16 *val,
> +			       u16 *mask);
>  #else
>  static inline int of_get_phy_mode(struct device_node *np)
>  {
> @@ -21,6 +25,14 @@ static inline const void *of_get_mac_address(struct device_node *np)
>  {
>  	return NULL;
>  }
> +static inline int of_get_mii_register(struct phy_device *np, int reg, u16 *val,
> +				      u16 *mask)
> +{
> +	*val = 0;
> +	*mask = 0;
> +
> +	return -EINVAL;
> +}
>  #endif
>  
>  #endif /* __LINUX_OF_NET_H */
> -- 
> 1.8.4.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH] fdtable: Avoid triggering OOMs from alloc_fdmem
From: Eric W. Biederman @ 2014-02-04 17:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, linux-kernel, linux-fsdevel, netdev, linux-mm,
	David Rientjes
In-Reply-To: <1391530721.4301.8.camel@edumazet-glaptop2.roam.corp.google.com>

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On Mon, 2014-02-03 at 21:26 -0800, Eric W. Biederman wrote:
>> Recently due to a spike in connections per second memcached on 3
>> separate boxes triggered the OOM killer from accept.  At the time the
>> OOM killer was triggered there was 4GB out of 36GB free in zone 1. The
>> problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
>> hold a bitmap, and there was sufficient fragmentation that the largest
>> page available was 8KiB.
>> 
>> I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
>> but I do agree that order 3 allocations are very likely to succeed.
>> 
>> There are always pathologies where order > 0 allocations can fail when
>> there are copious amounts of free memory available.  Using the pigeon
>> hole principle it is easy to show that it requires 1 page more than 50%
>> of the pages being free to guarantee an order 1 (8KiB) allocation will
>> succeed, 1 page more than 75% of the pages being free to guarantee an
>> order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of
>> the pages being free to guarantee an order 3 allocate will succeed.
>> 
>> A server churning memory with a lot of small requests and replies like
>> memcached is a common case that if anything can will skew the odds
>> against large pages being available.
>> 
>> Therefore let's not give external applications a practical way to kill
>> linux server applications, and specify __GFP_NORETRY to the kmalloc in
>> alloc_fdmem.  Unless I am misreading the code and by the time the code
>> reaches should_alloc_retry in __alloc_pages_slowpath (where
>> __GFP_NORETRY becomes signification).  We have already tried everything
>> reasonable to allocate a page and the only thing left to do is wait.  So
>> not waiting and falling back to vmalloc immediately seems like the
>> reasonable thing to do even if there wasn't a chance of triggering the
>> OOM killer.
>> 
>> Cc: stable@vger.kernel.org
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  fs/file.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>> 
>> diff --git a/fs/file.c b/fs/file.c
>> index 771578b33fb6..db25c2bdfe46 100644
>> --- a/fs/file.c
>> +++ b/fs/file.c
>> @@ -34,7 +34,7 @@ static void *alloc_fdmem(size_t size)
>>  	 * vmalloc() if the allocation size will be considered "large" by the VM.
>>  	 */
>>  	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>> -		void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
>> +		void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
>>  		if (data != NULL)
>>  			return data;
>>  	}
>
> Hi Eric
>
> I wrote yesterday a similar patch adding __GFP_NORETRY in following
> paths. I feel that alloc_fdmem() is only a part of the problem ;)

These code paths below were triggering OOMs for you?

I looked and didn't see a path flying through the air.

> What do you think, should we merge our changes or have distinct
> patches ?

I don't know about merging changes but certainly looking at the issue
together sounds good.

My gut feel says if there is a code path that has __GFP_NOWARN and
because of PAGE_ALLOC_COSTLY_ORDER we loop forever then there is
something fishy going on.

I would love to hear some people who are more current on the mm
subsystem than I am chime in.  It might be that the darn fix is going to
be to teach __alloc_pages_slowpath to not loop forever, unless order == 0. 
I expect the worst offenders need to have __GFP_NORETRY added so the
knowledge of what is going on spreads, and so we can avoid the danger of
needing to retune the whole mm subsystem that changing
__alloc_pages_slowpath does.

The two code paths below certainly look good canidates for having
__GFP_NORETRY added to them.  The same issues I ran into with
alloc_fdmem are likely to show up there as well.

Eric



> diff --git a/net/core/sock.c b/net/core/sock.c
> index 0c127dcdf6a8..5b6a9431b017 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1775,7 +1775,9 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
>  			while (order) {
>  				if (npages >= 1 << order) {
>  					page = alloc_pages(sk->sk_allocation |
> -							   __GFP_COMP | __GFP_NOWARN,
> +							   __GFP_COMP |
> +							   __GFP_NOWARN |
> +							   __GFP_NORETRY,
>  							   order);
>  					if (page)
>  						goto fill_page;
> @@ -1845,7 +1847,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio)
>  		gfp_t gfp = prio;
>  
>  		if (order)
> -			gfp |= __GFP_COMP | __GFP_NOWARN;
> +			gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY;
>  		pfrag->page = alloc_pages(gfp, order);
>  		if (likely(pfrag->page)) {
>  			pfrag->offset = 0;
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 1/2] dts: Add bindings for multicast hash bins and perfect filter entries
From: Rob Herring @ 2014-02-04 17:25 UTC (permalink / raw)
  To: Vince Bridgers
  Cc: devicetree@vger.kernel.org, netdev, Giuseppe CAVALLARO,
	Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	Dinh Nguyen, Rayagond Kokatanur
In-Reply-To: <1391199356-27226-2-git-send-email-vbridgers2013@gmail.com>

On Fri, Jan 31, 2014 at 2:15 PM, Vince Bridgers <vbridgers2013@gmail.com> wrote:
> This change adds bindings for the number of multicast hash bins and perfect
> filter entries supported by the Synopsys EMAC. The Synopsys EMAC core is
> configurable at device creation time, and can be configured for a different
> number of multicast hash bins and a different number of perfect filter entries.
> The device does not provide a way to query these parameters, therefore
> parameters are required. The Altera Cyclone V SOC has support for 256
> multicast hash bins and 128 perfect filter entries, and is different than
> what's currently provided in the stmmac driver.
>
> Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com>

Acked-by: Rob Herring <robh@kernel.org>

> ---
>  Documentation/devicetree/bindings/net/stmmac.txt |    6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt
> index 9d92d42..dbf7498 100644
> --- a/Documentation/devicetree/bindings/net/stmmac.txt
> +++ b/Documentation/devicetree/bindings/net/stmmac.txt
> @@ -34,6 +34,10 @@ Optional properties:
>         reset phandle is given
>  - max-frame-size:      Maximum Transfer Unit (IEEE defined MTU), rather
>                         than the maximum frame size.
> +- snps,multicast-filter-bins:  Number of multicast filter hash bins
> +                               supported by this device instance
> +- snps,perfect-filter-entries: Number of perfect filter entries supported
> +                               by this device instance
>
>  Examples:
>
> @@ -46,4 +50,6 @@ Examples:
>                 mac-address = [000000000000]; /* Filled in by U-Boot */
>                 max-frame-size = <3800>;
>                 phy-mode = "gmii";
> +               snps,multicast-filter-bins = <256>;
> +               snps,perfect-filter-entries = <128>;
>         };
> --
> 1.7.9.5
>

^ permalink raw reply

* Re: [PATCH] DT: net: document Ethernet bindings in one place
From: Grant Likely @ 2014-02-04 17:26 UTC (permalink / raw)
  To: Sergei Shtylyov, Florian Fainelli, Rob Herring
  Cc: netdev, Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell,
	Kumar Gala, devicetree@vger.kernel.org, Rob Landley,
	linux-doc@vger.kernel.org
In-Reply-To: <52E988FA.8040807@cogentembedded.com>

On Thu, 30 Jan 2014 02:04:26 +0300, Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> wrote:
> Hello.
> 
> On 01/22/2014 12:19 AM, Sergei Shtylyov wrote:
> 
> >>>>>     I'm afraid that's too late, it has spread very far, so that
> >>>>> of_get_phy_mode() handles that property, not "phy-connection-type".
> 
> >>>> Uggg, I guess this is a case of a defacto standard then if the kernel
> >>>> doesn't even support it.
> 
> >>> Maybe I forgot to CC you on patch sent to Grant only, I sent a patch a
> >>> while ago for of_get_phy_mode() to look for both "phy-mode" and
> >>> "phy-connection-type" since the former has been a Linux invention, but
> >>> the latter is ePAPR specified.
> 
> >> Here is a link to the actual patch in question, not sure which tree
> >> Grant applied it to though:
> 
> >> http://lkml.indiana.edu/hypermail/linux/kernel/1311.2/00048.html
> 
> >     It's not the patch mail, it's Grant's "applied" reply, patch is mangled in
> > this reply, and I couldn't follow the thread. Here's the actual patch mail:
> 
> > http://marc.info/?l=devicetree&m=138449662807254
> 
>     Florian, I didn't find this patch in Grant's official tree, so maybe you 
> should ask him where is the patch already?

Sorry, I accidentally dropped it. It will be in the next merge window.

g.

^ permalink raw reply

* Re: [bisected] Re: WARNING: at net/ipv4/devinet.c:1599
From: Cong Wang @ 2014-02-04 18:08 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: David S. Miller, Jiri Pirko, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <CAMuHMdVCeFek2LX0_AnA8zg_yYhXvzw19rH8OsqbWWb-rb9w-A@mail.gmail.com>

On Tue, Feb 4, 2014 at 6:19 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Anyone with a clue?
>

Looks like we need:

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index ac2dff3..bdbf68b 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1443,7 +1443,8 @@ static size_t inet_nlmsg_size(void)
               + nla_total_size(4) /* IFA_LOCAL */
               + nla_total_size(4) /* IFA_BROADCAST */
               + nla_total_size(IFNAMSIZ) /* IFA_LABEL */
-              + nla_total_size(4);  /* IFA_FLAGS */
+              + nla_total_size(4)  /* IFA_FLAGS */
+              + nla_total_size(sizeof(struct ifa_cacheinfo)); /*
IFA_CACHEINFO */
 }

 static inline u32 cstamp_delta(unsigned long cstamp)

^ permalink raw reply related

* Re: [PATCH] fdtable: Avoid triggering OOMs from alloc_fdmem
From: Cong Wang @ 2014-02-04 18:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, linux-kernel@vger.kernel.org, linux-fsdevel,
	netdev, linux-mm
In-Reply-To: <87r47jsb2p.fsf@xmission.com>

On Mon, Feb 3, 2014 at 9:26 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> diff --git a/fs/file.c b/fs/file.c
> index 771578b33fb6..db25c2bdfe46 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -34,7 +34,7 @@ static void *alloc_fdmem(size_t size)
>          * vmalloc() if the allocation size will be considered "large" by the VM.
>          */
>         if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> -               void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
> +               void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
>                 if (data != NULL)
>                         return data;
>         }

Or try again without __GFP_NORETRY like we do in nelink mmap?

diff --git a/fs/file.c b/fs/file.c
index 771578b..5c7a7b5 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -29,16 +29,20 @@ int sysctl_nr_open_max = 1024 * 1024; /* raised later */

 static void *alloc_fdmem(size_t size)
 {
+       void *data;
        /*
         * Very large allocations can stress page reclaim, so fall back to
         * vmalloc() if the allocation size will be considered "large"
by the VM.
         */
        if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-               void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
+               data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
                if (data != NULL)
                        return data;
        }
-       return vmalloc(size);
+       data = vmalloc(size);
+       if (data != NULL)
+               return data;
+       return kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
 }

 static void free_fdmem(void *ptr)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* Re: [PATCH] ipv6: default route for link local address is not added while assigning a address
From: sohny thomas @ 2014-02-04 18:37 UTC (permalink / raw)
  To: nicolas.dichtel, Sohny Thomas, netdev, linux-kernel, yoshfuji,
	davem, kumuda
In-Reply-To: <52EFC324.1030102@6wind.com>

On Monday 03 February 2014 09:56 PM, Nicolas Dichtel wrote:
>> 4) make flush not remove the fe80::/64 address
>>
>>     Least favourable to me. I guess this also woud need iproute change
>>     and seems most difficult to do.
> Why using this command 'ip -6 route flush proto static' isn't possible?
Ok I tried this and it works fine ( i.e, leaves out removing Link Local 
route), but I think this is a workaround to a problem that occurs if 
anyone actually deletes a Link local route

> I think that we know what kind of route is added for these TAHI tests,
> hence
> it's better to remove only routes added manually (or by a routing daemon if
> it's the case).
> Removing kernel routes may hide bugs: imagine the kernel adds a wrong
> route,
> TAHI will not detect it.

^ permalink raw reply

* Re: [PATCH] DT: net: document Ethernet bindings in one place
From: Sergei Shtylyov @ 2014-02-04 18:40 UTC (permalink / raw)
  To: Grant Likely, Florian Fainelli, Rob Herring
  Cc: netdev, Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell,
	Kumar Gala, devicetree@vger.kernel.org, Rob Landley,
	linux-doc@vger.kernel.org
In-Reply-To: <20140204172602.01420C4050F@trevor.secretlab.ca>

Hello.

On 02/04/2014 08:26 PM, Grant Likely wrote:

>>>>>>>      I'm afraid that's too late, it has spread very far, so that
>>>>>>> of_get_phy_mode() handles that property, not "phy-connection-type".

>>>>>> Uggg, I guess this is a case of a defacto standard then if the kernel
>>>>>> doesn't even support it.

>>>>> Maybe I forgot to CC you on patch sent to Grant only, I sent a patch a
>>>>> while ago for of_get_phy_mode() to look for both "phy-mode" and
>>>>> "phy-connection-type" since the former has been a Linux invention, but
>>>>> the latter is ePAPR specified.

>>>> Here is a link to the actual patch in question, not sure which tree
>>>> Grant applied it to though:

>>>> http://lkml.indiana.edu/hypermail/linux/kernel/1311.2/00048.html

>>>      It's not the patch mail, it's Grant's "applied" reply, patch is mangled in
>>> this reply, and I couldn't follow the thread. Here's the actual patch mail:

>>> http://marc.info/?l=devicetree&m=138449662807254

>>      Florian, I didn't find this patch in Grant's official tree, so maybe you
>> should ask him where is the patch already?

> Sorry, I accidentally dropped it. It will be in the next merge window.

    Already saw it, thanks. Would that it was in 3.14 instead of course, so 
that I could use "phy-connection-type" in my binding...

> g.

WBR, Sergei


^ permalink raw reply

* Re: [PATCH] fdtable: Avoid triggering OOMs from alloc_fdmem
From: Eric Dumazet @ 2014-02-04 18:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, linux-kernel, linux-fsdevel, netdev, linux-mm,
	David Rientjes
In-Reply-To: <871tzirdwf.fsf@xmission.com>

On Tue, 2014-02-04 at 09:22 -0800, Eric W. Biederman wrote:

> The two code paths below certainly look good canidates for having
> __GFP_NORETRY added to them.  The same issues I ran into with
> alloc_fdmem are likely to show up there as well.

Yes, this is what I thought : a write into TCP socket should be more
frequent than the alloc_fdmem() case ;)

But then, maybe your workload was only using UDP ?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCHv1 net] xen-netfront: handle backend CLOSED without CLOSING
From: David Vrabel @ 2014-02-04 18:50 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, Boris Ostrovsky, netdev,
	David S. Miller

From: David Vrabel <david.vrabel@citrix.com>

Backend drivers shouldn't transistion to CLOSED unless the frontend is
CLOSED.  If a backend does transition to CLOSED too soon then the
frontend may not see the CLOSING state and will not properly shutdown.

So, treat an unexpected backend CLOSED state the same as CLOSING.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Cc: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/xen-netfront.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index ff04d4f..f9daa9e 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1832,7 +1832,6 @@ static void netback_changed(struct xenbus_device *dev,
 	case XenbusStateReconfiguring:
 	case XenbusStateReconfigured:
 	case XenbusStateUnknown:
-	case XenbusStateClosed:
 		break;
 
 	case XenbusStateInitWait:
@@ -1847,6 +1846,10 @@ static void netback_changed(struct xenbus_device *dev,
 		netdev_notify_peers(netdev);
 		break;
 
+	case XenbusStateClosed:
+		if (dev->state == XenbusStateClosed)
+			break;
+		/* Missed the backend's CLOSING state -- fallthrough */
 	case XenbusStateClosing:
 		xenbus_frontend_closed(dev);
 		break;
-- 
1.7.2.5

^ permalink raw reply related

* Re: [PATCH] fdtable: Avoid triggering OOMs from alloc_fdmem
From: Eric W. Biederman @ 2014-02-04 18:53 UTC (permalink / raw)
  To: Cong Wang
  Cc: Andrew Morton, linux-kernel@vger.kernel.org, linux-fsdevel,
	netdev, linux-mm
In-Reply-To: <CAHA+R7OLnrujsinNhwVvZyJDz+BrTxYmw0gWeSSyq+dJ2LF9qg@mail.gmail.com>

Cong Wang <cwang@twopensource.com> writes:

> On Mon, Feb 3, 2014 at 9:26 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> diff --git a/fs/file.c b/fs/file.c
>> index 771578b33fb6..db25c2bdfe46 100644
>> --- a/fs/file.c
>> +++ b/fs/file.c
>> @@ -34,7 +34,7 @@ static void *alloc_fdmem(size_t size)
>>          * vmalloc() if the allocation size will be considered "large" by the VM.
>>          */
>>         if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>> -               void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
>> +               void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
>>                 if (data != NULL)
>>                         return data;
>>         }
>
> Or try again without __GFP_NORETRY like we do in nelink mmap?

I think I would much rather keep the current semantics of return -ENOMEM
and keep the problem localized then trigger a box wide OOM thank you
very much.


Retrying the kmalloc without __GFP_NORETRY is pointless.  If you are in
the unlikely 0.01% of the time when the kmalloc fails it is almost
certainly going to fail again.  Writing out_of_memory() as kmalloc()
is pointless and very confusing.

The vmalloc won't fail unless you are on a 32bit box.  So it isn't a
case that anyone has to deal with in practice.

Eric

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] fdtable: Avoid triggering OOMs from alloc_fdmem
From: Eric W. Biederman @ 2014-02-04 18:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, linux-kernel, linux-fsdevel, netdev, linux-mm,
	David Rientjes
In-Reply-To: <1391539464.10160.1.camel@edumazet-glaptop2.roam.corp.google.com>

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On Tue, 2014-02-04 at 09:22 -0800, Eric W. Biederman wrote:
>
>> The two code paths below certainly look good canidates for having
>> __GFP_NORETRY added to them.  The same issues I ran into with
>> alloc_fdmem are likely to show up there as well.
>
> Yes, this is what I thought : a write into TCP socket should be more
> frequent than the alloc_fdmem() case ;)
>
> But then, maybe your workload was only using UDP ?

As I have heard it described one tcp connection per small requestion,
and someone goofed and started creating new connections when the server
was bogged down.  But since all of the requests and replies were small I
don't expect even TCP would allocate more than a 4KiB page in that
worload.

I had oodles of 4KiB and 8KiB pages.  What size of memory allocation did
you see failing?  

Eric

^ permalink raw reply

* Re: [PATCH V3] net/dt: Add support for overriding phy configuration from device tree
From: Florian Fainelli @ 2014-02-04 19:01 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Matthew Garrett, netdev,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Kishon Vijay Abraham I
In-Reply-To: <1390145654.16433.102.camel-nDn/Rdv9kqW9Jme8/bJn5UCKIB8iOfG2tUK59QYPAWc@public.gmane.org>

2014-01-19 Ben Hutchings <ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org>:
> On Fri, 2014-01-17 at 17:57 -0500, Matthew Garrett wrote:
>> Some hardware may be broken in interesting and board-specific ways, such
>> that various bits of functionality don't work. This patch provides a
>> mechanism for overriding mii registers during init based on the contents of
>> the device tree data, allowing board-specific fixups without having to
>> pollute generic code.
> [...]
>> --- a/Documentation/devicetree/bindings/net/phy.txt
>> +++ b/Documentation/devicetree/bindings/net/phy.txt
>> @@ -23,6 +23,21 @@ Optional Properties:
>>    assume clause 22. The compatible list may also contain other
>>    elements.
>>
>> +The following properties may be added to either the phy node or the parent
>> +ethernet device. If not present, the hardware defaults will be used.
>> +
>> +- phy-mii-advertise-10half: 1 to advertise half-duplex 10MBit, 0 to disable
>> +- phy-mii-advertise-10full: 1 to advertise full-duplex 10MBit, 0 to disable
>> +- phy-mii-advertise-100half: 1 to advertise half-duplex 100MBit, 0 to disable
>> +- phy-mii-advertise-100full: 1 to advertise full-duplex 100MBit, 0 to disable
>> +- phy-mii-advertise-100base4: 1 to advertise 100base4, 0 to disable
>> +- phy-mii-advertise-1000half: 1 to advertise half-duplex 1000MBit, 0 to disable
>> +- phy-mii-advertise-1000full: 1 to advertise full-duplex 1000MBit, 0 to disable
>
> Are these really all needed?  Apparently there is a standard 'max-speed'
> property on Ethernet devices already, which I think is probably
> sufficient to express the actual restrictions of some boards.

This is what I think as well. Maybe there is the need for something
more fine-grained, although I really doubt it.

>
>> +- phy-mii-manual-master: 1 to enable manual master/slave configuration, 0
>> +  to disable manual master/slave configuration
> [...]
>
> Although the standard calls this 'manual', if it's set in the DT it
> won't really be a manual setting.  The description should probably
> clarify that.
>
> I think the name should include 'master-slave' or 'clock-role' rather
> than just 'master', as currently it suggests only the master role can be
> forced.

Right, and this would match what 802.3-2008, Section 22.2.4.3.7 mentions.

>
> Ben.
>
> --
> Ben Hutchings
> friends: People who know you well, but like you anyway.



-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCHv1 net] tg3: fix deadlock in tg3_change_mtu()
From: David Vrabel @ 2014-02-04 19:01 UTC (permalink / raw)
  To: netdev; +Cc: Nithin Nayak Sujir, Michael Chan, David Vrabel

5780 cards cannot have jumbo frames and TSO enabled together.  When
jumbo frames are enabled by setting the MTU, the TSO feature must be
cleared.  This is done indirectly by calling netdev_update_features()
which will call tg3_fix_features() to actually clear the flags.

netdev_update_features() will also trigger a new netlink message for
the feature change event which will result in a call to
tg3_get_stats64() which deadlocks on the tg3 lock.

Fix this by dropping the tg3 lock while calling
netdev_update_features().  This is safe as the device is down and the
hardware is stopped.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/net/ethernet/broadcom/tg3.c |   18 +++++++++++-------
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index e2ca03e..0b344c0 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -14077,17 +14077,13 @@ static inline void tg3_set_mtu(struct net_device *dev, struct tg3 *tp,
 	dev->mtu = new_mtu;
 
 	if (new_mtu > ETH_DATA_LEN) {
-		if (tg3_flag(tp, 5780_CLASS)) {
-			netdev_update_features(dev);
+		if (tg3_flag(tp, 5780_CLASS))
 			tg3_flag_clear(tp, TSO_CAPABLE);
-		} else {
+		else
 			tg3_flag_set(tp, JUMBO_RING_ENABLE);
-		}
 	} else {
-		if (tg3_flag(tp, 5780_CLASS)) {
+		if (tg3_flag(tp, 5780_CLASS))
 			tg3_flag_set(tp, TSO_CAPABLE);
-			netdev_update_features(dev);
-		}
 		tg3_flag_clear(tp, JUMBO_RING_ENABLE);
 	}
 }
@@ -14119,6 +14115,14 @@ static int tg3_change_mtu(struct net_device *dev, int new_mtu)
 
 	tg3_set_mtu(dev, tp, new_mtu);
 
+	tg3_full_unlock(tp);
+
+	/* Adjusting MTU may add/remove TSO feature. */
+	if (tg3_flag(tp, 5780_CLASS))
+		netdev_update_features(dev);
+
+	tg3_full_lock(tp, 0);
+
 	/* Reset PHY, otherwise the read DMA engine will be in a mode that
 	 * breaks all requests to 256 bytes.
 	 */
-- 
1.7.2.5

^ permalink raw reply related

* Re: [PATCH net-next v2] xen-netback: Rework rx_work_todo
From: Zoltan Kiss @ 2014-02-04 19:19 UTC (permalink / raw)
  To: Wei Liu
  Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies,
	David Miller
In-Reply-To: <20140120163814.GE11681@zion.uk.xensource.com>

On 20/01/14 16:38, Wei Liu wrote:
> On Wed, Jan 15, 2014 at 05:11:07PM +0000, Zoltan Kiss wrote:
>> The recent patch to fix receive side flow control (11b57f) solved the spinning
>> thread problem, however caused an another one. The receive side can stall, if:
>> - [THREAD] xenvif_rx_action sets rx_queue_stopped to true
>> - [INTERRUPT] interrupt happens, and sets rx_event to true
>> - [THREAD] then xenvif_kthread sets rx_event to false
>> - [THREAD] rx_work_todo doesn't return true anymore
>>
>> Also, if interrupt sent but there is still no room in the ring, it take quite a
>> long time until xenvif_rx_action realize it. This patch ditch that two variable,
>> and rework rx_work_todo. If the thread finds it can't fit more skb's into the
>> ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's
>> kept as 0. Then rx_work_todo will check if:
>> - there is something to send to the ring (like before)
>> - there is space for the topmost packet in the queue
>>
>> I think that's more natural and optimal thing to test than two bool which are
>> set somewhere else.
>>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>
> Sorry for the delay.
>
> Paul, thanks for reviewing.
>
> Acked-by: Wei Liu <wei.liu2@citrix.com>

Hi,

This patch haven't made it to net-next yet, maybe because the subject 
doesn't suggest that this is a bugfix. I suggest to apply it as soon as 
possible, otherwise netback will be quite broken.

Zoli

^ permalink raw reply

* Re: [bisected] Re: WARNING: at net/ipv4/devinet.c:1599
From: Geert Uytterhoeven @ 2014-02-04 19:20 UTC (permalink / raw)
  To: Cong Wang
  Cc: David S. Miller, Jiri Pirko, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <CAHA+R7Omv++NCg=nocVsW+tM=Zv_qCNhX1JjtFTuCzfz9B3Cig@mail.gmail.com>

On Tue, Feb 4, 2014 at 7:08 PM, Cong Wang <cwang@twopensource.com> wrote:
> On Tue, Feb 4, 2014 at 6:19 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>>
>> Anyone with a clue?
>>
>
> Looks like we need:
>
> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> index ac2dff3..bdbf68b 100644
> --- a/net/ipv4/devinet.c
> +++ b/net/ipv4/devinet.c
> @@ -1443,7 +1443,8 @@ static size_t inet_nlmsg_size(void)
>                + nla_total_size(4) /* IFA_LOCAL */
>                + nla_total_size(4) /* IFA_BROADCAST */
>                + nla_total_size(IFNAMSIZ) /* IFA_LABEL */
> -              + nla_total_size(4);  /* IFA_FLAGS */
> +              + nla_total_size(4)  /* IFA_FLAGS */
> +              + nla_total_size(sizeof(struct ifa_cacheinfo)); /*
> IFA_CACHEINFO */
>  }

Thanks for your suggestion, but it doesn't help :-(

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox