Netdev List
 help / color / mirror / Atom feed
* [PATCH net 3/4] selftests: mlxsw: qos_mc_aware: Tweak for min shaper
From: Ido Schimmel @ 2018-10-31  9:56 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata, mlxsw,
	Ido Schimmel
In-Reply-To: <20181031095601.29846-1-idosch@mellanox.com>

From: Petr Machata <petrm@mellanox.com>

Since the minimum shaper is now being enabled for MC TCs, it's
unreasonable to expect no UC traffic loss. Minimal min shaper value is
200Mbps, which is 20% of the 1Gbps that this test configures on egress.
To cover for glitches, tolerate up to 25% UC degradation under MC
overload.

Fixes: b5638d46c90a ("selftests: mlxsw: Add a test for UC behavior under MC flood")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh
index 0150bb2741eb..a8fc36d670e1 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh
@@ -311,7 +311,7 @@ test_mc_aware()
 			ret = 100 * ($ucth1 - $ucth2) / $ucth1
 			if (ret > 0) { ret } else { 0 }
 		    ")
-	check_err $(bc <<< "$deg > 10")
+	check_err $(bc <<< "$deg > 25")
 
 	local interval=$((d1 - d0))
 	local mc_ir=$(rate $u0 $u1 $interval)
-- 
2.17.2

^ permalink raw reply related

* [PATCH net 1/4] mlxsw: reg: QEEC: Add minimum shaper fields
From: Ido Schimmel @ 2018-10-31  9:56 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata, mlxsw,
	Ido Schimmel
In-Reply-To: <20181031095601.29846-1-idosch@mellanox.com>

From: Petr Machata <petrm@mellanox.com>

Add QEEC.mise (minimum shaper enable) and QEEC.min_shaper_rate to enable
configuration of minimum shaper.

Increase the QEEC length to 0x20 as well: that's the length that the
register has had for a long time now, but with the configurations that
mlxsw typically exercises, the firmware tolerated 0x1C-sized packets.
With mise=true however, FW rejects packets unless they have the full
required length.

Fixes: b9b7cee40579 ("mlxsw: reg: Add QoS ETS Element Configuration register")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 32cb6718bb17..db3d2790aeec 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -3284,7 +3284,7 @@ static inline void mlxsw_reg_qtct_pack(char *payload, u8 local_port,
  * Configures the ETS elements.
  */
 #define MLXSW_REG_QEEC_ID 0x400D
-#define MLXSW_REG_QEEC_LEN 0x1C
+#define MLXSW_REG_QEEC_LEN 0x20
 
 MLXSW_REG_DEFINE(qeec, MLXSW_REG_QEEC_ID, MLXSW_REG_QEEC_LEN);
 
@@ -3326,6 +3326,15 @@ MLXSW_ITEM32(reg, qeec, element_index, 0x04, 0, 8);
  */
 MLXSW_ITEM32(reg, qeec, next_element_index, 0x08, 0, 8);
 
+/* reg_qeec_mise
+ * Min shaper configuration enable. Enables configuration of the min
+ * shaper on this ETS element
+ * 0 - Disable
+ * 1 - Enable
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, qeec, mise, 0x0C, 31, 1);
+
 enum {
 	MLXSW_REG_QEEC_BYTES_MODE,
 	MLXSW_REG_QEEC_PACKETS_MODE,
@@ -3342,6 +3351,17 @@ enum {
  */
 MLXSW_ITEM32(reg, qeec, pb, 0x0C, 28, 1);
 
+/* The smallest permitted min shaper rate. */
+#define MLXSW_REG_QEEC_MIS_MIN	200000		/* Kbps */
+
+/* reg_qeec_min_shaper_rate
+ * Min shaper information rate.
+ * For CPU port, can only be configured for port hierarchy.
+ * When in bytes mode, value is specified in units of 1000bps.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, qeec, min_shaper_rate, 0x0C, 0, 28);
+
 /* reg_qeec_mase
  * Max shaper configuration enable. Enables configuration of the max
  * shaper on this ETS element.
-- 
2.17.2

^ permalink raw reply related

* [PATCH net 0/4] mlxsw: Enable minimum shaper on MC TCs
From: Ido Schimmel @ 2018-10-31  9:56 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata, mlxsw,
	Ido Schimmel

Petr says:

An MC-aware mode was introduced in commit 7b8195306694 ("mlxsw:
spectrum: Configure MC-aware mode on mlxsw ports"). In MC-aware mode,
BUM traffic gets a special treatment by being assigned to a separate set
of traffic classes 8..15. Pairs of TCs 0 and 8, 1 and 9, etc., are then
configured to strictly prioritize the lower-numbered ones. The intention
is to prevent BUM traffic from flooding the switch and push out all UC
traffic, which would otherwise happen, and instead give UC traffic
precedence.

However strictly prioritizing UC traffic has the effect that UC overload
pushes out all BUM traffic, such as legitimate ARP queries. These
packets are kept in queues for a while, but under sustained UC overload,
their lifetime eventually expires and these packets are dropped. That is
detrimental to network performance as well.

In this patchset, MC TCs (8..15) are configured with minimum shaper of
200Mbps (a minimum permitted value) to allow a trickle of necessary
control traffic to get through.

First in patch #1, the QEEC register is extended with fields necessary
to configure the minimum shaper.

In patch #2, minimum shaper is enabled on TCs 8..15.

In patches #3 and #4, first the MC-awareness test is tweaked to support
the minimum shaper, and then a new test is introduced to test that MC
traffic behaves well under UC overload.

Petr Machata (4):
  mlxsw: reg: QEEC: Add minimum shaper fields
  mlxsw: spectrum: Set minimum shaper on MC TCs
  selftests: mlxsw: qos_mc_aware: Tweak for min shaper
  selftests: mlxsw: qos_mc_aware: Add a test for UC awareness

 drivers/net/ethernet/mellanox/mlxsw/reg.h     | 22 ++++-
 .../net/ethernet/mellanox/mlxsw/spectrum.c    | 25 +++++
 .../drivers/net/mlxsw/qos_mc_aware.sh         | 95 ++++++++++++++-----
 3 files changed, 117 insertions(+), 25 deletions(-)

-- 
2.17.2

^ permalink raw reply

* Re: [RFC PATCH v3 06/10] udp: cope with UDP GRO packet misdirection
From: Paolo Abeni @ 2018-10-31  9:54 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <0fbc6f0cea4b3a976a003a593c6365cb4e4a9e99.1540920083.git.pabeni@redhat.com>

On Tue, 2018-10-30 at 18:24 +0100, Paolo Abeni wrote:
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -406,17 +406,24 @@ static inline int copy_linear_skb(struct sk_buff *skb, int len, int off,
>  } while(0)
>  
>  #if IS_ENABLED(CONFIG_IPV6)
> -#define __UDPX_INC_STATS(sk, field)					\
> -do {									\
> -	if ((sk)->sk_family == AF_INET)					\
> -		__UDP_INC_STATS(sock_net(sk), field, 0);		\
> -	else								\
> -		__UDP6_INC_STATS(sock_net(sk), field, 0);		\
> -} while (0)
> +#define __UDPX_MIB(sk, ipv4)						\
> +({									\
> +	ipv4 ? (IS_UDPLITE(sk) ? sock_net(sk)->mib.udplite_statistics :	\
> +				 sock_net(sk)->mib.udp_statistics) :	\
> +		(IS_UDPLITE(sk) ? sock_net(sk)->mib.udplite_stats_in6 :	\
> +				 sock_net(sk)->mib.udp_stats_in6);	\
> +})
>  #else
> -#define __UDPX_INC_STATS(sk, field) __UDP_INC_STATS(sock_net(sk), field, 0)
> +#define __UDPX_MIB(sk, ipv4)						\
> +({									\
> +	IS_UDPLITE(sk) ? sock_net(sk)->mib.udplite_statistics :		\
> +			 sock_net(sk)->mib.udp_statistics;		\
> +})
>  #endif
>  
> +#define __UDPX_INC_STATS(sk, field) \
> +	__SNMP_INC_STATS(__UDPX_MIB(sk, (sk)->sk_family == AF_INET, field)
> +

This is broken (complains only if CONFIG_AF_RXRPC is set), will fix in
next iteration (thanks kbuildbot).

But I'd prefer to keep the above helper: it can be used in a follow-up
patch to cleanup a bit udp6_recvmsg().

>  #ifdef CONFIG_PROC_FS
>  struct udp_seq_afinfo {
>  	sa_family_t			family;
> @@ -450,4 +457,32 @@ DECLARE_STATIC_KEY_FALSE(udpv6_encap_needed_key);
>  void udpv6_encap_enable(void);
>  #endif
>  
> +static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> +					      struct sk_buff *skb)
> +{
> +	bool ipv4 = skb->protocol == htons(ETH_P_IP);

And this cause a compile warning when # CONFIG_IPV6 is not set, I will
fix in the next iteration (again thanks kbuildbot)

Cheers,

Paolo

^ permalink raw reply

* Re: [RFC PATCH 3/4] igb: add support for extended PHC gettime
From: Miroslav Lichvar @ 2018-10-31  9:39 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev, intel-wired-lan, Jacob Keller
In-Reply-To: <20181031022916.nuvbjegnn6bsqxss@localhost>

On Tue, Oct 30, 2018 at 07:29:16PM -0700, Richard Cochran wrote:
> On Fri, Oct 26, 2018 at 06:27:41PM +0200, Miroslav Lichvar wrote:
> > +static int igb_ptp_gettimex(struct ptp_clock_info *ptp,
> > +			    struct ptp_system_timestamp *sts)
> > +{
> > +	struct igb_adapter *igb = container_of(ptp, struct igb_adapter,
> > +					       ptp_caps);
> > +	struct e1000_hw *hw = &igb->hw;
> > +	unsigned long flags;
> > +	u32 lo, hi;
> > +	u64 ns;
> > +
> > +	spin_lock_irqsave(&igb->tmreg_lock, flags);
> > +
> > +	/* 82576 doesn't have SYSTIMR */
> > +	if (igb->hw.mac.type == e1000_82576) {
> 
> Instead of if/then/else, can't you follow the pattern of providing
> different function flavors ...

I can. I was just trying to minimize the amount of triplicated code.
In the next version I'll add a patch to deprecate the old gettime
functions, as Jacob suggested, and replace them with the extended
versions, so the amount of code will not change that much.

Thanks,

-- 
Miroslav Lichvar

^ permalink raw reply

* Re: [PATCH 4/4] net: macb: Add support for suspend/resume with full power down
From: Harini Katakam @ 2018-10-31 16:56 UTC (permalink / raw)
  To: Claudiu Beznea
  Cc: Harini Katakam, Nicolas Ferre, David Miller, netdev, linux-kernel,
	Michal Simek, appanad
In-Reply-To: <f69c428d-565e-cf8e-de53-45850f2625cf@microchip.com>

Hi Claudiu,

> Hi Harini,
>
> I applied these patches on net-next/master cloned from [1], updated this
> moment, but I don't have a phy_dev member in struct macb. Maybe you wanted
> to use netdev->phydev ?
>
> Thank you,
> Claudiu Beznea
>
I apologize. Yes, I'm using netdev->phydev and it was uncommitted on my branch
@@ -4264,8 +4264,8 @@ static int __maybe_unused macb_suspend(struct device *dev)
                netif_device_detach(netdev);
                for (q = 0, queue = bp->queues; q < bp->num_queues;
++q, ++queue)
                        napi_disable(&queue->napi);
-               phy_stop(bp->phy_dev);
-               phy_suspend(bp->phy_dev);
+               phy_stop(netdev->phydev);
+               phy_suspend(netdev->phydev);
                spin_lock_irqsave(&bp->lock, flags);
                macb_reset_hw(bp);
                spin_unlock_irqrestore(&bp->lock, flags);
@@ -4300,9 +4300,9 @@ static int __maybe_unused macb_resume(struct device *dev)
                macb_writel(bp, NCR, MACB_BIT(MPE));
                for (q = 0, queue = bp->queues; q < bp->num_queues;
++q, ++queue)
                        napi_enable(&queue->napi);
-               phy_resume(bp->phy_dev);
-               phy_init_hw(bp->phy_dev);
-               phy_start(bp->phy_dev);
+               phy_resume(netdev->phydev);
+               phy_init_hw(netdev->phydev);
+               phy_start(netdev->phydev);
Will fix in next set.

Regards,
Harini

^ permalink raw reply

* Re: WARNING in rds_message_alloc_sgs
From: Santosh Shilimkar @ 2018-10-31 16:44 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: syzbot, linux-rdma, netdev, rds-devel, syzkaller-bugs, davem,
	linux-kernel
In-Reply-To: <20181031064220.GN3974@mtr-leonro.mtl.com>

On 10/30/2018 11:42 PM, Leon Romanovsky wrote:
> On Tue, Oct 30, 2018 at 12:38:02PM -0700, Santosh Shilimkar wrote:
>> On 10/30/2018 12:28 PM, syzbot wrote:
>>> Hello,
>>>
>>> syzbot found the following crash on:
>>>
>>> HEAD commit:    6201f31a39f8 Add linux-next specific files for 20181030
>>> git tree:       linux-next
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1397d06d400000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=2a22859d870756c1
>>> dashboard link:
>>> https://syzkaller.appspot.com/bug?extid=26de17458aeda9d305d8
>>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10bb52eb400000
>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=118bdfc5400000
>>>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: syzbot+26de17458aeda9d305d8@syzkaller.appspotmail.com
>>>
>>> WARNING: CPU: 0 PID: 19789 at net/rds/message.c:316
>>> rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
>>> Kernel panic - not syncing: panic_on_warn set ...
>> Looks like this kernel build has panic on warn enabled which
>> triggers panic for " WARN_ON(!nr_pages)" case. Will look into
>> it. Thanks !!
> 
> Please don't forget to remove user triggered WARN_ON.
> https://lwn.net/Articles/769365/
> "Greg Kroah-Hartman raised the problem of core kernel API code that will
> use WARN_ON_ONCE() to complain about bad usage; that will not generate
> the desired result if WARN_ON_ONCE() is configured to crash the machine.
> He was told that the code should just call pr_warn() instead, and that
> the called function should return an error in such situations. It was
> generally agreed that any WARN_ON() or WARN_ON_ONCE() calls that can be
> triggered from user space need to be fixed."
> 
OK. Thanks for the note !!

^ permalink raw reply

* Re: [PATCH net] net: ethernet: cadence: fix socket buffer corruption problem
From: Claudiu.Beznea @ 2018-10-31  7:45 UTC (permalink / raw)
  To: Tristram.Ha; +Cc: UNGLinuxDriver, netdev, davem, Nicolas.Ferre
In-Reply-To: <93AF473E2DA327428DE3D46B72B1E9FD411D09C8@CHN-SV-EXMX02.mchp-main.com>



On 30.10.2018 21:36, Tristram Ha - C24268 wrote:
>> Could you check on your side that applying this on top of your patch, your
>> scenario is still working?
>>
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c
>> b/drivers/net/ethernet/cadence/macb_main.c
>> index 1d86b4d5645a..e1347d6d1b50 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -1702,12 +1702,8 @@ static int macb_pad_and_fcs(struct sk_buff **skb,
>> struct net_device *ndev)
>>                 *skb = nskb;
>>         }
>>
>> -       if (padlen) {
>> -               if (padlen >= ETH_FCS_LEN)
>> -                       skb_put_zero(*skb, padlen - ETH_FCS_LEN);
>> -               else
>> -                       skb_trim(*skb, ETH_FCS_LEN - padlen);
>> -       }
>> +       if (padlen > ETH_FCS_LEN)
>> +               skb_put_zero(*skb, padlen - ETH_FCS_LEN);
> 
> I think it is okay but I need to check all paths are covered.

On my side I checked with pktgen generating packets with sizes starting
from 1-MTU. Same scripts I also used when I first implemented this but it
seems that your scenario wasn't covered. Please have a look and let me know
how it works on your side.

> 
>> It was reported in [1] that UDP checksum is offloaded to hardware no matter
>> the application previously computed it.
>>
>> The code should be executed only for packets that has checksum computed
>> by
>> applications ((*skb)->ip_summed != CHECKSUM_PARTIAL). The idea was to
>> not
>> recompute checksum for packets with checksum already computed. To do
>> so,
>> while hardware checksum is enabled (NETIF_F_HW_CSUM), TX_NOCRC bit
>> should
>> be set on buffer descriptor. But to do so, packets must have a minimum size
>> of 64 and FCS to be computed.
>>
>> The NETIF_F_HW_CSUM check was placed there because the issue
>> described in
>> [1] is reproducible because hardware checksum is enabled and overrides the
>> checksum provided by applications.
>>
>> [1] https://www.spinics.net/lists/netdev/msg505065.html
> 
> I understand the issue now.  It is weird that the transmit descriptor does not
> have direct control over turning on checksum generation or not, but it wastes
> 3 bits returning the error codes of such generation.  What can the driver do
> with such information?

Yep, from my POV it would have been nice if it could have been able to do
the pad an FCS even when hardware checksum is enabled and TX_NOCRC bit is
set in descriptor. For cases when hardware checksum is disable it seems
that it could handle it.

> 
> In my opinion then hardware transmit checksumming cannot be supported
> In Linux.
> 
>>> NETIF_F_SG is not enabled in the MAC I used, so enabling
>> NETIF_IF_HW_CSUM
>>> is rather pointless.  With the padding code the transmit throughput cannot
>> get
>>> higher than 100Mbps in a gigabit connection.
>>>
>>> I would recommend to add this option to disable manual padding in one of
>> those
>>> macb_config structures.
>>
>> In this way the user would have to know from the beginning what kind of
>> packets are used.
>>
> 
> The kernel already does a good job of calculating checksum.  Using hardware to
> do that does not improve performance much.
> 
> Alternative is to use ethtool to disable hardware tx checksum so that software can
> intentionally send wrong checksums.
> 

^ permalink raw reply

* Re: [PATCH net] net: dsa: microchip: initialize mutex before use
From: Pavel Machek @ 2018-10-31  7:42 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: David S. Miller, Andrew Lunn, Florian Fainelli, UNGLinuxDriver,
	netdev
In-Reply-To: <1540943149-26832-1-git-send-email-Tristram.Ha@microchip.com>

[-- Attachment #1: Type: text/plain, Size: 539 bytes --]

On Tue 2018-10-30 16:45:49, Tristram.Ha@microchip.com wrote:
> From: Tristram Ha <Tristram.Ha@microchip.com>
> 
> Initialize mutex before use.  Avoid kernel complaint when
> CONFIG_DEBUG_LOCK_ALLOC is enabled.
> 
> Fixes: b987e98e50ab90e5 ("dsa: add DSA switch driver for Microchip KSZ9477")
> Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>

Acked-by: Pavel Machek <pavel@ucw.cz>

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* [PATCH iproute2-next v1 4/4] rdma: Document IB device renaming option
From: Leon Romanovsky @ 2018-10-31  7:17 UTC (permalink / raw)
  To: David Ahern
  Cc: Leon Romanovsky, netdev, RDMA mailing list, Stephen Hemminger,
	Steve Wise
In-Reply-To: <20181031071758.6178-1-leon@kernel.org>

From: Leon Romanovsky <leonro@mellanox.com>

[leonro@server /]$ lspci |grep -i Ether
00:08.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:09.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
[leonro@server /]$ sudo rdma dev
1: mlx5_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455
sys_image_guid 5254:00c0:fe12:3455
[leonro@server /]$ sudo rdma dev set mlx5_0 name hfi1_0
[leonro@server /]$ sudo rdma dev
1: hfi1_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455
sys_image_guid 5254:00c0:fe12:3455

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 man/man8/rdma-dev.8 | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/man/man8/rdma-dev.8 b/man/man8/rdma-dev.8
index 461681b6..b2f9964a 100644
--- a/man/man8/rdma-dev.8
+++ b/man/man8/rdma-dev.8
@@ -22,6 +22,12 @@ rdmak-dev \- RDMA device configuration
 .B rdma dev show
 .RI "[ " DEV " ]"
 
+.ti -8
+.B rdma dev set
+.RI "[ " DEV " ]"
+.BR name
+.BR NEWNAME
+
 .ti -8
 .B rdma dev help
 
@@ -33,6 +39,8 @@ rdmak-dev \- RDMA device configuration
 - specifies the RDMA device to show.
 If this argument is omitted all devices are listed.
 
+.SS rdma dev set - rename rdma device
+
 .SH "EXAMPLES"
 .PP
 rdma dev
@@ -45,6 +53,11 @@ rdma dev show mlx5_3
 Shows the state of specified RDMA device.
 .RE
 .PP
+rdma dev set mlx5_3 name rdma_0
+.RS 4
+Renames the mlx5_3 device to be named rdma_0.
+.RE
+.PP
 
 .SH SEE ALSO
 .BR rdma (8),
-- 
2.14.4

^ permalink raw reply related

* [PATCH iproute2-next v1 3/4] rdma: Add an option to rename IB device interface
From: Leon Romanovsky @ 2018-10-31  7:17 UTC (permalink / raw)
  To: David Ahern
  Cc: Leon Romanovsky, netdev, RDMA mailing list, Stephen Hemminger,
	Steve Wise
In-Reply-To: <20181031071758.6178-1-leon@kernel.org>

From: Leon Romanovsky <leonro@mellanox.com>

Enrich rdmatool with an option to rename IB devices,
the command interface follows Iproute2 convention:
"rdma dev set [OLD-DEVNAME] name NEW-DEVNAME"

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
---
 rdma/dev.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/rdma/dev.c b/rdma/dev.c
index e2eafe47..760b7fb3 100644
--- a/rdma/dev.c
+++ b/rdma/dev.c
@@ -14,6 +14,7 @@
 static int dev_help(struct rd *rd)
 {
 	pr_out("Usage: %s dev show [DEV]\n", rd->filename);
+	pr_out("       %s dev set [DEV] name DEVNAME\n", rd->filename);
 	return 0;
 }
 
@@ -240,17 +241,51 @@ static int dev_one_show(struct rd *rd)
 	return rd_exec_cmd(rd, cmds, "parameter");
 }
 
+static int dev_set_name(struct rd *rd)
+{
+	uint32_t seq;
+
+	if (rd_no_arg(rd)) {
+		pr_err("Please provide device new name.\n");
+		return -EINVAL;
+	}
+
+	rd_prepare_msg(rd, RDMA_NLDEV_CMD_SET,
+		       &seq, (NLM_F_REQUEST | NLM_F_ACK));
+	mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_DEV_INDEX, rd->dev_idx);
+	mnl_attr_put_strz(rd->nlh, RDMA_NLDEV_ATTR_DEV_NAME, rd_argv(rd));
+
+	return rd_send_msg(rd);
+}
+
+static int dev_one_set(struct rd *rd)
+{
+	const struct rd_cmd cmds[] = {
+		{ NULL,		dev_help},
+		{ "name",	dev_set_name},
+		{ 0 }
+	};
+
+	return rd_exec_cmd(rd, cmds, "parameter");
+}
+
 static int dev_show(struct rd *rd)
 {
 	return rd_exec_dev(rd, dev_one_show);
 }
 
+static int dev_set(struct rd *rd)
+{
+	return rd_exec_require_dev(rd, dev_one_set);
+}
+
 int cmd_dev(struct rd *rd)
 {
 	const struct rd_cmd cmds[] = {
 		{ NULL,		dev_show },
 		{ "show",	dev_show },
 		{ "list",	dev_show },
+		{ "set",	dev_set },
 		{ "help",	dev_help },
 		{ 0 }
 	};
-- 
2.14.4

^ permalink raw reply related

* [PATCH iproute2-next v1 2/4] rdma: Introduce command execution helper with required device name
From: Leon Romanovsky @ 2018-10-31  7:17 UTC (permalink / raw)
  To: David Ahern
  Cc: Leon Romanovsky, netdev, RDMA mailing list, Stephen Hemminger,
	Steve Wise
In-Reply-To: <20181031071758.6178-1-leon@kernel.org>

From: Leon Romanovsky <leonro@mellanox.com>

In contradiction to various show commands, the set command explicitly
requires to use device name as an argument. Provide new command
execution helper which enforces it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
---
 rdma/rdma.h  |  1 +
 rdma/utils.c | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/rdma/rdma.h b/rdma/rdma.h
index c3b7530b..547bb574 100644
--- a/rdma/rdma.h
+++ b/rdma/rdma.h
@@ -90,6 +90,7 @@ int cmd_link(struct rd *rd);
 int cmd_res(struct rd *rd);
 int rd_exec_cmd(struct rd *rd, const struct rd_cmd *c, const char *str);
 int rd_exec_dev(struct rd *rd, int (*cb)(struct rd *rd));
+int rd_exec_require_dev(struct rd *rd, int (*cb)(struct rd *rd));
 int rd_exec_link(struct rd *rd, int (*cb)(struct rd *rd), bool strict_port);
 void rd_free(struct rd *rd);
 int rd_set_arg_to_devname(struct rd *rd);
diff --git a/rdma/utils.c b/rdma/utils.c
index 4840bf22..61f4aeb1 100644
--- a/rdma/utils.c
+++ b/rdma/utils.c
@@ -577,6 +577,16 @@ out:
 	return ret;
 }
 
+int rd_exec_require_dev(struct rd *rd, int (*cb)(struct rd *rd))
+{
+	if (rd_no_arg(rd)) {
+		pr_err("Please provide device name.\n");
+		return -EINVAL;
+	}
+
+	return rd_exec_dev(rd, cb);
+}
+
 int rd_exec_cmd(struct rd *rd, const struct rd_cmd *cmds, const char *str)
 {
 	const struct rd_cmd *c;
-- 
2.14.4

^ permalink raw reply related

* [PATCH iproute2-next v1 1/4] rdma: Update kernel include file to support IB device renaming
From: Leon Romanovsky @ 2018-10-31  7:17 UTC (permalink / raw)
  To: David Ahern
  Cc: Leon Romanovsky, netdev, RDMA mailing list, Stephen Hemminger,
	Steve Wise
In-Reply-To: <20181031071758.6178-1-leon@kernel.org>

From: Leon Romanovsky <leonro@mellanox.com>

Bring kernel header file changes upto commit 05d940d3a3ec
("RDMA/nldev: Allow IB device rename through RDMA netlink")

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
---
 rdma/include/uapi/rdma/rdma_netlink.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/rdma/include/uapi/rdma/rdma_netlink.h b/rdma/include/uapi/rdma/rdma_netlink.h
index 6513fb89..e2228c09 100644
--- a/rdma/include/uapi/rdma/rdma_netlink.h
+++ b/rdma/include/uapi/rdma/rdma_netlink.h
@@ -227,8 +227,9 @@ enum rdma_nldev_command {
 	RDMA_NLDEV_CMD_UNSPEC,
 
 	RDMA_NLDEV_CMD_GET, /* can dump */
+	RDMA_NLDEV_CMD_SET,
 
-	/* 2 - 4 are free to use */
+	/* 3 - 4 are free to use */
 
 	RDMA_NLDEV_CMD_PORT_GET = 5, /* can dump */
 
-- 
2.14.4

^ permalink raw reply related

* [PATCH iproute2-next v1 0/4] rdma: IB device rename
From: Leon Romanovsky @ 2018-10-31  7:17 UTC (permalink / raw)
  To: David Ahern
  Cc: Leon Romanovsky, netdev, RDMA mailing list, Stephen Hemminger,
	Steve Wise

From: Leon Romanovsky <leonro@mellanox.com>

Changelog:
v0->v1:
 * Added Steve's ROB on first three patches.
 * Updated manual
RFC->v0
 * Dropped RFC tag after kernel part was accepted
 * Updated commit message of first patch to include correct SHA1
-----------------------------------------------------------------------

Hi,

This is comprehensive part of kernel series posted earlier. The kernel
part is not accepted yet, so first patch will have different commit
message with different commit SHA1. This is why it is marked as RFC.

An example:

[leonro@server /]$ lspci |grep -i Ether
00:08.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:09.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
[leonro@server /]$ sudo rdma dev
1: mlx5_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
[leonro@server /]$ sudo rdma dev set mlx5_0 name hfi1_0
[leonro@server /]$ sudo rdma dev
1: hfi1_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455

Thanks

Leon Romanovsky (4):
  rdma: Update kernel include file to support IB device renaming
  rdma: Introduce command execution helper with required device name
  rdma: Add an option to rename IB device interface
  rdma: Document IB device renaming option

 man/man8/rdma-dev.8                   | 13 +++++++++++++
 rdma/dev.c                            | 35 +++++++++++++++++++++++++++++++++++
 rdma/include/uapi/rdma/rdma_netlink.h |  3 ++-
 rdma/rdma.h                           |  1 +
 rdma/utils.c                          | 10 ++++++++++
 5 files changed, 61 insertions(+), 1 deletion(-)

^ permalink raw reply

* Re: [PATCH 4/4] net: macb: Add support for suspend/resume with full power down
From: Claudiu.Beznea @ 2018-10-31 16:10 UTC (permalink / raw)
  To: harini.katakam, Nicolas.Ferre, davem
  Cc: netdev, linux-kernel, michal.simek, harinikatakamlinux, appanad
In-Reply-To: <1540957223-30984-5-git-send-email-harini.katakam@xilinx.com>



On 31.10.2018 05:40, Harini Katakam wrote:
> When macb device is suspended and system is powered down, the clocks
> are removed and hence macb should be closed gracefully and restored
> upon resume. This patch does the same by switching off the net device,
> suspending phy and performing necessary cleanup of interrupts and BDs.
> Upon resume, all these are reinitialized again.
> 
> Reset of macb device is done only when GEM is not a wake device.
> Even when gem is a wake device, tx queues can be stopped and ptp device
> can be closed (tsu clock will be disabled in pm_runtime_suspend) as
> wake event detection has no dependency on this.
> 
> Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
> Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
> ---
> Notes:
> I was unable to do a full macb_close/open in this patch as suggested
> because it was freeing and allocating the full RX/TX buffers and
> this time consuming, also leading to a crash when done continuously
> in stress tests. 
> 
>  drivers/net/ethernet/cadence/macb_main.c | 38 ++++++++++++++++++++++++++++++--
>  1 file changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 09cb4bb..0d1acb4 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -4247,16 +4247,33 @@ static int __maybe_unused macb_suspend(struct device *dev)
>  	struct platform_device *pdev = to_platform_device(dev);
>  	struct net_device *netdev = platform_get_drvdata(pdev);
>  	struct macb *bp = netdev_priv(netdev);
> +	struct macb_queue *queue = bp->queues;
> +	unsigned long flags;
> +	unsigned int q;
> +
> +	if (!netif_running(netdev))
> +		return 0;
>  
> -	netif_carrier_off(netdev);
> -	netif_device_detach(netdev);
>  
>  	if (bp->wol & MACB_WOL_ENABLED) {
>  		macb_writel(bp, IER, MACB_BIT(WOL));
>  		macb_writel(bp, WOL, MACB_BIT(MAG));
>  		enable_irq_wake(bp->queues[0].irq);
> +		netif_device_detach(netdev);
> +	} else {
> +		netif_device_detach(netdev);
> +		for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue)
> +			napi_disable(&queue->napi);
> +		phy_stop(bp->phy_dev);

Hi Harini,

I applied these patches on net-next/master cloned from [1], updated this
moment, but I don't have a phy_dev member in struct macb. Maybe you wanted
to use netdev->phydev ?

Thank you,
Claudiu Beznea

[1] https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git

> +		phy_suspend(bp->phy_dev);
> +		spin_lock_irqsave(&bp->lock, flags);
> +		macb_reset_hw(bp);
> +		spin_unlock_irqrestore(&bp->lock, flags);
>  	}
>  
> +	netif_carrier_off(netdev);
> +	if (bp->ptp_info)
> +		bp->ptp_info->ptp_remove(netdev);
>  	pm_runtime_force_suspend(dev);
>  
>  	return 0;
> @@ -4267,6 +4284,11 @@ static int __maybe_unused macb_resume(struct device *dev)
>  	struct platform_device *pdev = to_platform_device(dev);
>  	struct net_device *netdev = platform_get_drvdata(pdev);
>  	struct macb *bp = netdev_priv(netdev);
> +	struct macb_queue *queue = bp->queues;
> +	unsigned int q;
> +
> +	if (!netif_running(netdev))
> +		return 0;
>  
>  	pm_runtime_force_resume(dev);
>  
> @@ -4274,9 +4296,21 @@ static int __maybe_unused macb_resume(struct device *dev)
>  		macb_writel(bp, IDR, MACB_BIT(WOL));
>  		macb_writel(bp, WOL, 0);
>  		disable_irq_wake(bp->queues[0].irq);
> +	} else {
> +		macb_writel(bp, NCR, MACB_BIT(MPE));
> +		for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue)
> +			napi_enable(&queue->napi);
> +		phy_resume(bp->phy_dev);
> +		phy_init_hw(bp->phy_dev);
> +		phy_start(bp->phy_dev);
>  	}
>  
> +	bp->macbgem_ops.mog_init_rings(bp);
> +	macb_init_hw(bp);
> +	macb_set_rx_mode(netdev);
>  	netif_device_attach(netdev);
> +	if (bp->ptp_info)
> +		bp->ptp_info->ptp_init(netdev);
>  
>  	return 0;
>  }
> 

^ permalink raw reply

* Re: [PATCH 3/4] net: macb: Add pm runtime support
From: Harini Katakam @ 2018-10-31 15:50 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Harini Katakam, Nicolas Ferre, David Miller, Claudiu Beznea,
	netdev, linux-kernel, Michal Simek, Shubhrajyoti Datta
In-Reply-To: <20181031154733.GC25521@lunn.ch>

Hi Andrew,
On Wed, Oct 31, 2018 at 9:18 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > Yes, I understand. But I thought I'd handle it when we have a separate
> > MDIO bus driver. As of now, with macb, I do not see a way, for ex.,
> > for MAC1 to access MDIO0 when MAC0 is suspended.
> > However, I will work a bit more on this solution.
>
> You could look at the FEC driver. Its runtime suspend works well for
> me, and i've some complex setups, with two FEC interfaces, PHY and
> switches hanging off various MDIO busses etc.

OK sure, thanks.

Regards,
Harini

^ permalink raw reply

* Re: [PATCH net] rtnetlink: invoke 'cb->done' destructor before 'cb->args' reset
From: Alexey Kodanev @ 2018-10-31  6:42 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern, David Miller
In-Reply-To: <1540968178-18894-1-git-send-email-alexey.kodanev@oracle.com>

On 31.10.2018 09:42, Alexey Kodanev wrote:
> cb->args[2] can store the pointer to the struct fib6_walker,
> allocated in inet6_dump_fib(). On the next loop iteration in
> rtnl_dump_all(), 'memset(&cb, 0, sizeof(cb->args))' can reset
> that pointer, leaking the memory [1].
>

On the second thought we could as well save the state of fib6_walker
in inet6_dump_fib() with cb->data. That should fix the leak too.
Is it sounds reasonable?

Thanks,
Alexey

 
> Fix it by calling cb->done, if it is set, before filling 'cb->args'
> with zeros.
> 
> Looks like the recent changes in rtnl_dump_all() contributed to
> the appearance of this kmemleak [1], commit c63586dc9b3e ("net:
> rtnl_dump_all needs to propagate error from dumpit function")
> breaks the loop only on an error now.
> 
> [1]:
> unreferenced object 0xffff88001322a200 (size 96):
>   comm "sshd", pid 1484, jiffies 4296032768 (age 1432.542s)
>   hex dump (first 32 bytes):
>     00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
>     18 09 41 36 00 88 ff ff 18 09 41 36 00 88 ff ff  ..A6......A6....
>   backtrace:
>     [<0000000095846b39>] kmem_cache_alloc_trace+0x151/0x220
>     [<000000007d12709f>] inet6_dump_fib+0x68d/0x940
>     [<000000002775a316>] rtnl_dump_all+0x1d9/0x2d0
>     [<00000000d7cd302b>] netlink_dump+0x945/0x11a0
>     [<000000002f43485f>] __netlink_dump_start+0x55d/0x800
>     [<00000000f76bbeec>] rtnetlink_rcv_msg+0x4fa/0xa00
>     [<000000009b5761f3>] netlink_rcv_skb+0x29c/0x420
>     [<0000000087a1dae1>] rtnetlink_rcv+0x15/0x20
>     [<00000000691b703b>] netlink_unicast+0x4e3/0x6c0
>     [<00000000b5be0204>] netlink_sendmsg+0x7f2/0xba0
>     [<0000000096d2aa60>] sock_sendmsg+0xba/0xf0
>     [<000000008c1b786f>] __sys_sendto+0x1e4/0x330
>     [<0000000019587b3f>] __x64_sys_sendto+0xe1/0x1a0
>     [<00000000071f4d56>] do_syscall_64+0x9f/0x300
>     [<000000002737577f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>     [<0000000057587684>] 0xffffffffffffffff
> 
> Fixes: 1b43af5480c3 ("[IPV6]: Increase number of possible routing tables to 2^32")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
>  net/core/rtnetlink.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index f679c7a..314c683 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -3362,6 +3362,8 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
>  			continue;
>  
>  		if (idx > s_idx) {
> +			if (cb->done)
> +				cb->done(cb);
>  			memset(&cb->args[0], 0, sizeof(cb->args));
>  			cb->prev_seq = 0;
>  			cb->seq = 0;
> 

^ permalink raw reply

* [PATCH net] rtnetlink: invoke 'cb->done' destructor before 'cb->args' reset
From: Alexey Kodanev @ 2018-10-31  6:42 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern, David Miller, Alexey Kodanev

cb->args[2] can store the pointer to the struct fib6_walker,
allocated in inet6_dump_fib(). On the next loop iteration in
rtnl_dump_all(), 'memset(&cb, 0, sizeof(cb->args))' can reset
that pointer, leaking the memory [1].

Fix it by calling cb->done, if it is set, before filling 'cb->args'
with zeros.

Looks like the recent changes in rtnl_dump_all() contributed to
the appearance of this kmemleak [1], commit c63586dc9b3e ("net:
rtnl_dump_all needs to propagate error from dumpit function")
breaks the loop only on an error now.

[1]:
unreferenced object 0xffff88001322a200 (size 96):
  comm "sshd", pid 1484, jiffies 4296032768 (age 1432.542s)
  hex dump (first 32 bytes):
    00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
    18 09 41 36 00 88 ff ff 18 09 41 36 00 88 ff ff  ..A6......A6....
  backtrace:
    [<0000000095846b39>] kmem_cache_alloc_trace+0x151/0x220
    [<000000007d12709f>] inet6_dump_fib+0x68d/0x940
    [<000000002775a316>] rtnl_dump_all+0x1d9/0x2d0
    [<00000000d7cd302b>] netlink_dump+0x945/0x11a0
    [<000000002f43485f>] __netlink_dump_start+0x55d/0x800
    [<00000000f76bbeec>] rtnetlink_rcv_msg+0x4fa/0xa00
    [<000000009b5761f3>] netlink_rcv_skb+0x29c/0x420
    [<0000000087a1dae1>] rtnetlink_rcv+0x15/0x20
    [<00000000691b703b>] netlink_unicast+0x4e3/0x6c0
    [<00000000b5be0204>] netlink_sendmsg+0x7f2/0xba0
    [<0000000096d2aa60>] sock_sendmsg+0xba/0xf0
    [<000000008c1b786f>] __sys_sendto+0x1e4/0x330
    [<0000000019587b3f>] __x64_sys_sendto+0xe1/0x1a0
    [<00000000071f4d56>] do_syscall_64+0x9f/0x300
    [<000000002737577f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [<0000000057587684>] 0xffffffffffffffff

Fixes: 1b43af5480c3 ("[IPV6]: Increase number of possible routing tables to 2^32")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---
 net/core/rtnetlink.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f679c7a..314c683 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3362,6 +3362,8 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 			continue;
 
 		if (idx > s_idx) {
+			if (cb->done)
+				cb->done(cb);
 			memset(&cb->args[0], 0, sizeof(cb->args));
 			cb->prev_seq = 0;
 			cb->seq = 0;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v2 2/2] trace: remove kretprobed checks
From: Aleksa Sarai @ 2018-10-31 15:25 UTC (permalink / raw)
  To: Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Masami Hiramatsu, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Steven Rostedt, Shuah Khan, Alexei Starovoitov,
	Daniel Borkmann
  Cc: Aleksa Sarai, Aleksa Sarai, Christian Brauner, Brendan Gregg,
	netdev, linux-doc, linux-kernel, linux-kselftest
In-Reply-To: <20181031152543.12138-1-cyphar@cyphar.com>

This is effectively a reversion of commit 76094a2cf46e ("ftrace:
distinguish kretprobe'd functions in trace logs"), as the checking of
kretprobe_trampoline *for tracing* is no longer necessary with the new
kretprobe stack trace changes.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
 kernel/trace/trace_output.c | 34 ++++------------------------------
 1 file changed, 4 insertions(+), 30 deletions(-)

diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 6e6cc64faa38..951de16bd4fd 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -321,36 +321,14 @@ int trace_output_call(struct trace_iterator *iter, char *name, char *fmt, ...)
 }
 EXPORT_SYMBOL_GPL(trace_output_call);
 
-#ifdef CONFIG_KRETPROBES
-static inline const char *kretprobed(const char *name)
-{
-	static const char tramp_name[] = "kretprobe_trampoline";
-	int size = sizeof(tramp_name);
-
-	if (strncmp(tramp_name, name, size) == 0)
-		return "[unknown/kretprobe'd]";
-	return name;
-}
-#else
-static inline const char *kretprobed(const char *name)
-{
-	return name;
-}
-#endif /* CONFIG_KRETPROBES */
-
 static void
 seq_print_sym_short(struct trace_seq *s, const char *fmt, unsigned long address)
 {
 	char str[KSYM_SYMBOL_LEN];
 #ifdef CONFIG_KALLSYMS
-	const char *name;
-
 	kallsyms_lookup(address, NULL, NULL, NULL, str);
-
-	name = kretprobed(str);
-
-	if (name && strlen(name)) {
-		trace_seq_printf(s, fmt, name);
+	if (strlen(str)) {
+		trace_seq_printf(s, fmt, str);
 		return;
 	}
 #endif
@@ -364,13 +342,9 @@ seq_print_sym_offset(struct trace_seq *s, const char *fmt,
 {
 	char str[KSYM_SYMBOL_LEN];
 #ifdef CONFIG_KALLSYMS
-	const char *name;
-
 	sprint_symbol(str, address);
-	name = kretprobed(str);
-
-	if (name && strlen(name)) {
-		trace_seq_printf(s, fmt, name);
+	if (strlen(str)) {
+		trace_seq_printf(s, fmt, str);
 		return;
 	}
 #endif
-- 
2.19.1

^ permalink raw reply related

* [PATCH v2 1/2] kretprobe: produce sane stack traces
From: Aleksa Sarai @ 2018-10-31 15:25 UTC (permalink / raw)
  To: Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Masami Hiramatsu, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Steven Rostedt, Shuah Khan, Alexei Starovoitov,
	Daniel Borkmann
  Cc: Aleksa Sarai, Brendan Gregg, Christian Brauner, Aleksa Sarai,
	netdev, linux-doc, linux-kernel, linux-kselftest
In-Reply-To: <20181031152543.12138-1-cyphar@cyphar.com>

Historically, kretprobe has always produced unusable stack traces
(kretprobe_trampoline is the only entry in most cases, because of the
funky stack pointer overwriting). This has caused quite a few annoyances
when using tracing to debug problems[1] -- since return values are only
available with kretprobes but stack traces were only usable for kprobes,
users had to probe both and then manually associate them.

With the advent of bpf_trace, users would have been able to do this
association in bpf, but this was less than ideal (because
bpf_get_stackid would still produce rubbish and programs that didn't
know better would get silly results). The main usecase for stack traces
(at least with bpf_trace) is for DTrace-style aggregation on stack
traces (both entry and exit). Therefore we cannot simply correct the
stack trace on exit -- we must stash away the stack trace and return the
entry stack trace when it is requested.

[1]: https://github.com/iovisor/bpftrace/issues/101

Cc: Brendan Gregg <bgregg@netflix.com>
Cc: Christian Brauner <christian@brauner.io>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
 Documentation/kprobes.txt                     |   6 +-
 include/linux/kprobes.h                       |  15 +++
 kernel/events/callchain.c                     |   8 +-
 kernel/kprobes.c                              | 101 +++++++++++++++++-
 kernel/trace/trace.c                          |  11 +-
 .../test.d/kprobe/kretprobe_stacktrace.tc     |  25 +++++
 6 files changed, 161 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc

diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 10f4499e677c..1965585848f4 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -597,7 +597,11 @@ address with the trampoline's address, stack backtraces and calls
 to __builtin_return_address() will typically yield the trampoline's
 address instead of the real return address for kretprobed functions.
 (As far as we can tell, __builtin_return_address() is used only
-for instrumentation and error reporting.)
+for instrumentation and error reporting.) However, since return probes
+are used extensively in tracing (where stack backtraces are useful),
+return probes will stash away the stack backtrace during function entry
+so that return probe handlers can use the entry backtrace instead of
+having a trace with just kretprobe_trampoline.
 
 If the number of times a function is called does not match the number
 of times it returns, registering a return probe on that function may
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index e909413e4e38..b2d6e8b74731 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -40,6 +40,8 @@
 #include <linux/rcupdate.h>
 #include <linux/mutex.h>
 #include <linux/ftrace.h>
+#include <linux/stacktrace.h>
+#include <linux/perf_event.h>
 #include <asm/kprobes.h>
 
 #ifdef CONFIG_KPROBES
@@ -168,11 +170,18 @@ struct kretprobe {
 	raw_spinlock_t lock;
 };
 
+#define KRETPROBE_TRACE_SIZE 127
+struct kretprobe_trace {
+	int nr_entries;
+	unsigned long entries[KRETPROBE_TRACE_SIZE];
+};
+
 struct kretprobe_instance {
 	struct hlist_node hlist;
 	struct kretprobe *rp;
 	kprobe_opcode_t *ret_addr;
 	struct task_struct *task;
+	struct kretprobe_trace entry;
 	char data[0];
 };
 
@@ -371,6 +380,12 @@ void unregister_kretprobe(struct kretprobe *rp);
 int register_kretprobes(struct kretprobe **rps, int num);
 void unregister_kretprobes(struct kretprobe **rps, int num);
 
+struct kretprobe_instance *current_kretprobe_instance(void);
+void kretprobe_save_stack_trace(struct kretprobe_instance *ri,
+				struct stack_trace *trace);
+void kretprobe_perf_callchain_kernel(struct kretprobe_instance *ri,
+				     struct perf_callchain_entry_ctx *ctx);
+
 void kprobe_flush_task(struct task_struct *tk);
 void recycle_rp_inst(struct kretprobe_instance *ri, struct hlist_head *head);
 
diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 24a77c34e9ad..98edcd8a6987 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -12,6 +12,7 @@
 #include <linux/perf_event.h>
 #include <linux/slab.h>
 #include <linux/sched/task_stack.h>
+#include <linux/kprobes.h>
 
 #include "internal.h"
 
@@ -197,9 +198,14 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
 	ctx.contexts_maxed = false;
 
 	if (kernel && !user_mode(regs)) {
+		struct kretprobe_instance *ri = current_kretprobe_instance();
+
 		if (add_mark)
 			perf_callchain_store_context(&ctx, PERF_CONTEXT_KERNEL);
-		perf_callchain_kernel(&ctx, regs);
+		if (ri)
+			kretprobe_perf_callchain_kernel(ri, &ctx);
+		else
+			perf_callchain_kernel(&ctx, regs);
 	}
 
 	if (user) {
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 90e98e233647..fca3964d18cd 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1206,6 +1206,16 @@ __releases(hlist_lock)
 }
 NOKPROBE_SYMBOL(kretprobe_table_unlock);
 
+static bool kretprobe_hash_is_locked(struct task_struct *tsk)
+{
+	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
+	raw_spinlock_t *hlist_lock;
+
+	hlist_lock = kretprobe_table_lock_ptr(hash);
+	return raw_spin_is_locked(hlist_lock);
+}
+NOKPROBE_SYMBOL(kretprobe_hash_is_locked);
+
 /*
  * This function is called from finish_task_switch when task tk becomes dead,
  * so that we can recycle any function-return probe instances associated
@@ -1800,6 +1810,13 @@ unsigned long __weak arch_deref_entry_point(void *entry)
 	return (unsigned long)entry;
 }
 
+static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs);
+
+static inline bool kprobe_is_retprobe(struct kprobe *kp)
+{
+	return kp->pre_handler == pre_handler_kretprobe;
+}
+
 #ifdef CONFIG_KRETPROBES
 /*
  * This kprobe pre_handler is registered with every kretprobe. When probe
@@ -1826,6 +1843,8 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 	hash = hash_ptr(current, KPROBE_HASH_BITS);
 	raw_spin_lock_irqsave(&rp->lock, flags);
 	if (!hlist_empty(&rp->free_instances)) {
+		struct stack_trace trace = {};
+
 		ri = hlist_entry(rp->free_instances.first,
 				struct kretprobe_instance, hlist);
 		hlist_del(&ri->hlist);
@@ -1834,6 +1853,11 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 		ri->rp = rp;
 		ri->task = current;
 
+		trace.entries = &ri->entry.entries[0];
+		trace.max_entries = KRETPROBE_TRACE_SIZE;
+		save_stack_trace_regs(regs, &trace);
+		ri->entry.nr_entries = trace.nr_entries;
+
 		if (rp->entry_handler && rp->entry_handler(ri, regs)) {
 			raw_spin_lock_irqsave(&rp->lock, flags);
 			hlist_add_head(&ri->hlist, &rp->free_instances);
@@ -1856,6 +1880,65 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(pre_handler_kretprobe);
 
+/*
+ * Return the kretprobe_instance associated with the current_kprobe. Calling
+ * this is only reasonable from within a kretprobe handler context (otherwise
+ * return NULL).
+ *
+ * Must be called within a kretprobe_hash_lock(current, ...) context.
+ */
+struct kretprobe_instance *current_kretprobe_instance(void)
+{
+	struct kprobe *kp;
+	struct kretprobe *rp;
+	struct kretprobe_instance *ri;
+	struct hlist_head *head;
+	unsigned long hash = hash_ptr(current, KPROBE_HASH_BITS);
+
+	kp = kprobe_running();
+	if (!kp || !kprobe_is_retprobe(kp))
+		return NULL;
+	if (WARN_ON(!kretprobe_hash_is_locked(current)))
+		return NULL;
+
+	rp = container_of(kp, struct kretprobe, kp);
+	head = &kretprobe_inst_table[hash];
+
+	hlist_for_each_entry(ri, head, hlist) {
+		if (ri->task == current && ri->rp == rp)
+			return ri;
+	}
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(current_kretprobe_instance);
+NOKPROBE_SYMBOL(current_kretprobe_instance);
+
+void kretprobe_save_stack_trace(struct kretprobe_instance *ri,
+				struct stack_trace *trace)
+{
+	int i;
+	struct kretprobe_trace *krt = &ri->entry;
+
+	for (i = trace->skip; i < krt->nr_entries; i++) {
+		if (trace->nr_entries >= trace->max_entries)
+			break;
+		trace->entries[trace->nr_entries++] = krt->entries[i];
+	}
+}
+
+void kretprobe_perf_callchain_kernel(struct kretprobe_instance *ri,
+				     struct perf_callchain_entry_ctx *ctx)
+{
+	int i;
+	struct kretprobe_trace *krt = &ri->entry;
+
+	for (i = 0; i < krt->nr_entries; i++) {
+		if (krt->entries[i] == ULONG_MAX)
+			break;
+		perf_callchain_store(ctx, (u64) krt->entries[i]);
+	}
+}
+
 bool __weak arch_kprobe_on_func_entry(unsigned long offset)
 {
 	return !offset;
@@ -2005,6 +2088,22 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(pre_handler_kretprobe);
 
+struct kretprobe_instance *current_kretprobe_instance(void)
+{
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(current_kretprobe_instance);
+NOKPROBE_SYMBOL(current_kretprobe_instance);
+
+void kretprobe_save_stack_trace(struct kretprobe_instance *ri,
+				struct stack_trace *trace)
+{
+}
+
+void kretprobe_perf_callchain_kernel(struct kretprobe_instance *ri,
+				     struct perf_callchain_entry_ctx *ctx)
+{
+}
 #endif /* CONFIG_KRETPROBES */
 
 /* Set the kprobe gone and remove its instruction buffer. */
@@ -2241,7 +2340,7 @@ static void report_probe(struct seq_file *pi, struct kprobe *p,
 	char *kprobe_type;
 	void *addr = p->addr;
 
-	if (p->pre_handler == pre_handler_kretprobe)
+	if (kprobe_is_retprobe(p))
 		kprobe_type = "r";
 	else
 		kprobe_type = "k";
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index bf6f1d70484d..2210d38a4dbf 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -42,6 +42,7 @@
 #include <linux/nmi.h>
 #include <linux/fs.h>
 #include <linux/trace.h>
+#include <linux/kprobes.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/rt.h>
 
@@ -2590,6 +2591,7 @@ static void __ftrace_trace_stack(struct ring_buffer *buffer,
 	struct ring_buffer_event *event;
 	struct stack_entry *entry;
 	struct stack_trace trace;
+	struct kretprobe_instance *ri = current_kretprobe_instance();
 	int use_stack;
 	int size = FTRACE_STACK_ENTRIES;
 
@@ -2626,7 +2628,9 @@ static void __ftrace_trace_stack(struct ring_buffer *buffer,
 		trace.entries		= this_cpu_ptr(ftrace_stack.calls);
 		trace.max_entries	= FTRACE_STACK_MAX_ENTRIES;
 
-		if (regs)
+		if (ri)
+			kretprobe_save_stack_trace(ri, &trace);
+		else if (regs)
 			save_stack_trace_regs(regs, &trace);
 		else
 			save_stack_trace(&trace);
@@ -2653,7 +2657,10 @@ static void __ftrace_trace_stack(struct ring_buffer *buffer,
 	else {
 		trace.max_entries	= FTRACE_STACK_ENTRIES;
 		trace.entries		= entry->caller;
-		if (regs)
+
+		if (ri)
+			kretprobe_save_stack_trace(ri, &trace);
+		else if (regs)
 			save_stack_trace_regs(regs, &trace);
 		else
 			save_stack_trace(&trace);
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc
new file mode 100644
index 000000000000..03146c6a1a3c
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc
@@ -0,0 +1,25 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0+
+# description: Kretprobe dynamic event with a stacktrace
+
+[ -f kprobe_events ] || exit_unsupported # this is configurable
+
+echo 0 > events/enable
+echo 1 > options/stacktrace
+
+echo 'r:teststackprobe sched_fork $retval' > kprobe_events
+grep teststackprobe kprobe_events
+test -d events/kprobes/teststackprobe
+
+clear_trace
+echo 1 > events/kprobes/teststackprobe/enable
+( echo "forked")
+echo 0 > events/kprobes/teststackprobe/enable
+
+# Make sure we don't see kretprobe_trampoline and we see _do_fork.
+! grep 'kretprobe' trace
+grep '_do_fork' trace
+
+echo '-:teststackprobe' >> kprobe_events
+clear_trace
+test -d events/kprobes/teststackprobe && exit_fail || exit_pass
-- 
2.19.1

^ permalink raw reply related

* [PATCH v2 0/2] kretprobe: produce sane stack traces
From: Aleksa Sarai @ 2018-10-31 15:25 UTC (permalink / raw)
  To: Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Masami Hiramatsu, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Steven Rostedt, Shuah Khan, Alexei Starovoitov,
	Daniel Borkmann
  Cc: Aleksa Sarai, Aleksa Sarai, Christian Brauner, Brendan Gregg,
	netdev, linux-doc, linux-kernel, linux-kselftest

Historically, kretprobe has always produced unusable stack traces
(kretprobe_trampoline is the only entry in most cases, because of the
funky stack pointer overwriting). This has caused quite a few annoyances
when using tracing to debug problems[1] -- since return values are only
available with kretprobes but stack traces were only usable for kprobes,
users had to probe both and then manually associate them.

This patch series stores the stack trace within kretprobe_instance on
the kprobe entry used to set up the kretprobe. This allows for
DTrace-style stack aggregation between function entry and exit with
tools like BPFtrace -- which would not really be doable if the stack
unwinder understood kretprobe_trampoline.

We also revert commit 76094a2cf46e ("ftrace: distinguish kretprobe'd
functions in trace logs") and any follow-up changes because that code is
no longer necessary now that stack traces are sane. *However* this patch
might be a bit contentious since the original usecase (that ftrace
returns shouldn't show kretprobe_trampoline) is arguably still an
issue. Feel free to drop it if you think it is wrong.

Patch changelog:
 v2:
   * documentation: mention kretprobe stack-stashing
   * ftrace: add self-test for fixed kretprobe stacktraces
   * ftrace: remove [unknown/kretprobe'd] handling
   * kprobe: remove needless EXPORT statements
   * kprobe: minor corrections to current_kretprobe_instance (switch
     away from hlist_for_each_entry_safe)
   * kprobe: make maximum stack size 127, which is the ftrace default

(I forgot to Cc the BPF folks in v1, I've added them now.)

Aleksa Sarai (2):
  kretprobe: produce sane stack traces
  trace: remove kretprobed checks

 Documentation/kprobes.txt                     |   6 +-
 include/linux/kprobes.h                       |  15 +++
 kernel/events/callchain.c                     |   8 +-
 kernel/kprobes.c                              | 101 +++++++++++++++++-
 kernel/trace/trace.c                          |  11 +-
 kernel/trace/trace_output.c                   |  34 +-----
 .../test.d/kprobe/kretprobe_stacktrace.tc     |  25 +++++
 7 files changed, 165 insertions(+), 35 deletions(-)
 create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc

-- 
2.19.1

^ permalink raw reply

* Re: [PATCH 3/4] net: macb: Add pm runtime support
From: Harini Katakam @ 2018-10-31 15:10 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Harini Katakam, Nicolas Ferre, David Miller, Claudiu Beznea,
	netdev, linux-kernel, Michal Simek, Shubhrajyoti Datta
In-Reply-To: <20181031145400.GH20889@lunn.ch>

Hi Andrew,
On Wed, Oct 31, 2018 at 8:24 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Wed, Oct 31, 2018 at 09:10:22AM +0530, Harini Katakam wrote:
> > From: Harini Katakam <harinik@xilinx.com>
> >
> > Add runtime pm functions and move clock handling there.
> > If device is suspended and not a wake device, then return from
> > mdio read/write functions without performing any action because
> > the clocks are not active.
> >
> > Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
> > Signed-off-by: Harini Katakam <harinik@xilinx.com>
> > ---
> > Changes from RFC:
> > Updated pm get sync/put sync calls.
> > Removed unecessary clk up in mdio helpers.
>
> This last bit has me worried.
>
> The MDIO bus is a shared bus with a life of its own.  You can have
> multiple PHYs and switches on it. The PHYs can for a different
> Ethernet MAC. Switch drivers will expect to be able to address the
> switch when the interface is down.
>
> The FEC driver did something similar for a while. I had to make MDIO
> read/write runtime PM aware, otherwise i could not access the Ethernet
> switch hanging of its MDIO bus.

Yes, I understand. But I thought I'd handle it when we have a separate
MDIO bus driver. As of now, with macb, I do not see a way, for ex.,
for MAC1 to access MDIO0 when MAC0 is suspended.
However, I will work a bit more on this solution.

With the clk up, I've noticed that there is atleast one PHY status poll
that ends up bringing the clock up after the MAC has been suspended.
Of course phy is supended immediately after that but it just delays
this full power down process.

Regards,
Harini

^ permalink raw reply

* [PATCH] net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules
From: Niklas Cassel @ 2018-10-31 15:08 UTC (permalink / raw)
  To: Giuseppe Cavallaro, Alexandre Torgue, Jose Abreu, David S. Miller,
	Maxime Coquelin
  Cc: vkoul, Niklas Cassel, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel

When building stmmac, it is only possible to select CONFIG_DWMAC_GENERIC,
or any of the glue drivers, when CONFIG_STMMAC_PLATFORM is set.
The only exception is CONFIG_STMMAC_PCI.

When calling of_mdiobus_register(), it will call our ->reset()
callback, which is set to stmmac_mdio_reset().

Most of the code in stmmac_mdio_reset() is protected by a
"#if defined(CONFIG_STMMAC_PLATFORM)", which will evaluate
to false when CONFIG_STMMAC_PLATFORM=m.

Because of this, the phy reset gpio will only be pulled when
stmmac is built as built-in, but not when built as modules.

Fix this by using "#if IS_ENABLED()" instead of "#if defined()".

Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index b72ef171477e..bdd351597b55 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -243,7 +243,7 @@ static int stmmac_mdio_write(struct mii_bus *bus, int phyaddr, int phyreg,
  */
 int stmmac_mdio_reset(struct mii_bus *bus)
 {
-#if defined(CONFIG_STMMAC_PLATFORM)
+#if IS_ENABLED(CONFIG_STMMAC_PLATFORM)
 	struct net_device *ndev = bus->priv;
 	struct stmmac_priv *priv = netdev_priv(ndev);
 	unsigned int mii_address = priv->hw->mii.addr;
-- 
2.17.2

^ permalink raw reply related

* Re: [PATCH 4/4] net: macb: Add support for suspend/resume with full power down
From: Harini Katakam @ 2018-10-31 15:01 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Harini Katakam, Nicolas Ferre, David Miller, Claudiu Beznea,
	netdev, linux-kernel, Michal Simek, appanad
In-Reply-To: <20181031145800.GI20889@lunn.ch>

Hi Andrew,
On Wed, Oct 31, 2018 at 8:28 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Wed, Oct 31, 2018 at 09:10:23AM +0530, Harini Katakam wrote:
> > When macb device is suspended and system is powered down, the clocks
> > are removed and hence macb should be closed gracefully and restored
> > upon resume. This patch does the same by switching off the net device,
> > suspending phy and performing necessary cleanup of interrupts and BDs.
> > Upon resume, all these are reinitialized again.
> >
> > Reset of macb device is done only when GEM is not a wake device.
> > Even when gem is a wake device, tx queues can be stopped and ptp device
> > can be closed (tsu clock will be disabled in pm_runtime_suspend) as
> > wake event detection has no dependency on this.
> >
> > Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
> > Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
> > ---
> > Notes:
> > I was unable to do a full macb_close/open in this patch as suggested
> > because it was freeing and allocating the full RX/TX buffers and
> > this time consuming, also leading to a crash when done continuously
> > in stress tests.
>
> Hi Harini
>
> Did you try stress testing just plain up/down, which will call
> macb_open/macb_close? It could be it is broken already, and
> suspend/resume just makes it more obvious it is broken.

Yes, I did. Plain up/down stress tests do not show any problems.

Regards,
Harini

^ permalink raw reply

* Re: [PATCH 4/4] net: macb: Add support for suspend/resume with full power down
From: Andrew Lunn @ 2018-10-31 14:58 UTC (permalink / raw)
  To: Harini Katakam
  Cc: nicolas.ferre, davem, claudiu.beznea, netdev, linux-kernel,
	michal.simek, harinikatakamlinux, Kedareswara rao Appana
In-Reply-To: <1540957223-30984-5-git-send-email-harini.katakam@xilinx.com>

On Wed, Oct 31, 2018 at 09:10:23AM +0530, Harini Katakam wrote:
> When macb device is suspended and system is powered down, the clocks
> are removed and hence macb should be closed gracefully and restored
> upon resume. This patch does the same by switching off the net device,
> suspending phy and performing necessary cleanup of interrupts and BDs.
> Upon resume, all these are reinitialized again.
> 
> Reset of macb device is done only when GEM is not a wake device.
> Even when gem is a wake device, tx queues can be stopped and ptp device
> can be closed (tsu clock will be disabled in pm_runtime_suspend) as
> wake event detection has no dependency on this.
> 
> Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
> Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
> ---
> Notes:
> I was unable to do a full macb_close/open in this patch as suggested
> because it was freeing and allocating the full RX/TX buffers and
> this time consuming, also leading to a crash when done continuously
> in stress tests. 

Hi Harini

Did you try stress testing just plain up/down, which will call
macb_open/macb_close? It could be it is broken already, and
suspend/resume just makes it more obvious it is broken.

	       Andrew

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox