Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [Intel-wired-lan] [next-queue PATCH v7 09/10] igb: Add the skeletons for tc-flower offloading
From: Brown, Aaron F @ 2018-04-14  2:23 UTC (permalink / raw)
  To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, Sanchez-Palencia, Jesus
In-Reply-To: <20180410174959.18757-10-vinicius.gomes@intel.com>

> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Vinicius Costa Gomes
> Sent: Tuesday, April 10, 2018 10:50 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus <jesus.sanchez-
> palencia@intel.com>
> Subject: [Intel-wired-lan] [next-queue PATCH v7 09/10] igb: Add the
> skeletons for tc-flower offloading
> 
> This adds basic functions needed to implement offloading for filters
> created by tc-flower.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 66
> +++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
> 

Tested by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply

* RE: [next-queue PATCH v7 10/10] igb: Add support for adding offloaded clsflower filters
From: Brown, Aaron F @ 2018-04-14  2:25 UTC (permalink / raw)
  To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
  Cc: Gomes, Vinicius, Kirsher, Jeffrey T, netdev@vger.kernel.org,
	Sanchez-Palencia, Jesus
In-Reply-To: <20180410174959.18757-11-vinicius.gomes@intel.com>

> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Vinicius Costa Gomes
> Sent: Tuesday, April 10, 2018 10:50 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: Gomes, Vinicius <vinicius.gomes@intel.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Sanchez-Palencia,
> Jesus <jesus.sanchez-palencia@intel.com>
> Subject: [next-queue PATCH v7 10/10] igb: Add support for adding offloaded
> clsflower filters
> 
> This allows filters added by tc-flower and specifying MAC addresses,
> Ethernet types, and the VLAN priority field, to be offloaded to the
> controller.
> 
> This reuses most of the infrastructure used by ethtool, but clsflower
> filters are kept in a separated list, so they are invisible to
> ethtool.
> 
> To setup clsflower offloading:
> 
> $ tc qdisc replace dev eth0 handle 100: parent root mqprio \
>      	   	   num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
> 		   queues 1@0 1@1 2@2 hw 0
> (clsflower offloading depends on the netword driver to be configured
> with multiple traffic classes, we use mqprio's 'num_tc' parameter to
> set it to 3)
> 
> $ tc qdisc add dev eth0 ingress
> 
> Examples of filters:
> 
> $ tc filter add dev eth0 parent ffff: flower \
>      	    dst_mac aa:aa:aa:aa:aa:aa \
> 	    hw_tc 2 skip_sw
> (just a simple filter filtering for the destination MAC address and
> steering that traffic to queue 2)
> 
> $ tc filter add dev enp2s0 parent ffff: proto 0x22f0 flower \
>      	    src_mac cc:cc:cc:cc:cc:cc \
> 	    hw_tc 1 skip_sw
> (as the i210 doesn't support steering traffic based on the source
> address alone, we need to use another steering traffic, in this case
> we are using the ethernet type (0x22f0) to steer traffic to queue 1)
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  drivers/net/ethernet/intel/igb/igb.h      |   2 +
>  drivers/net/ethernet/intel/igb/igb_main.c | 188
> +++++++++++++++++++++-
>  2 files changed, 188 insertions(+), 2 deletions(-)

Tested by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply

* RE: [next-queue PATCH] igb: Fix the transmission mode of queue 0 for Qav mode
From: Brown, Aaron F @ 2018-04-14  2:28 UTC (permalink / raw)
  To: Gomes, Vinicius, intel-wired-lan@lists.osuosl.org
  Cc: Gomes, Vinicius, Kirsher, Jeffrey T, netdev@vger.kernel.org,
	Sanchez-Palencia, Jesus, Guedes, Andre
In-Reply-To: <20180331000652.2855-1-vinicius.gomes@intel.com>

> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Vinicius Costa Gomes
> Sent: Friday, March 30, 2018 5:07 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: Gomes, Vinicius <vinicius.gomes@intel.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Sanchez-Palencia,
> Jesus <jesus.sanchez-palencia@intel.com>; Guedes, Andre
> <andre.guedes@intel.com>
> Subject: [next-queue PATCH] igb: Fix the transmission mode of queue 0 for
> Qav mode
> 
> When Qav mode is enabled, queue 0 should be kept on Stream Reservation
> mode. From the i210 datasheet, section 8.12.19:
> 
> "Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to
> Qav." ("QueueMode 1b" represents the Stream Reservation mode)
> 
> The solution is to give queue 0 the all the credits it might need, so
> it has priority over queue 1.
> 
> A situation where this can happen is when cbs is "installed" only on
> queue 1, leaving queue 0 alone. For example:
> 
> $ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \
>      	   map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
> 
> $ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \
>      	   hicredit 30 sendslope -980000 idleslope 20000 offload 1
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 

Tested by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply

* image work
From: Ross @ 2018-04-13 12:58 UTC (permalink / raw)
  To: netdev

Hi,

Not sure if you received my email from last week.

We offer following image editing services:
images cutting out, clipping path, masking
jewelry photos retouching
beauty photos retouching
also wedding photos etc

If you want to test our quality of work.
You may send us one photo with instruction and we will work on it.

Hope to hear from you soon.

Regards,
Ross
The Studio Manager

^ permalink raw reply

* Re: [PATCH] netfilter: CONFIG_NF_REJECT_IPV{4,6} becomes bool toggle
From: kbuild test robot @ 2018-04-14  6:54 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: kbuild-all, Arnd Bergmann, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller, netfilter-devel, coreteam, Networking,
	Linux Kernel Mailing List
In-Reply-To: <20180413131558.3jw5dhub5gcyotyt@salvia>

[-- Attachment #1: Type: text/plain, Size: 1446 bytes --]

Hi Pablo,

I love your patch! Yet something to improve:

[auto build test ERROR on nf-next/master]
[also build test ERROR on v4.16 next-20180413]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pablo-Neira-Ayuso/netfilter-CONFIG_NF_REJECT_IPV-4-6-becomes-bool-toggle/20180414-101337
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   net/ipv6/netfilter/nf_reject_ipv6.o: In function `nf_reject_ip6_tcphdr_get':
>> nf_reject_ipv6.c:(.text+0x342): undefined reference to `nf_ip6_checksum'
   net/ipv6/netfilter/nf_reject_ipv6.o: In function `nf_send_reset6':
>> nf_reject_ipv6.c:(.text+0xcc2): undefined reference to `ip6_route_output_flags'
   net/ipv6/netfilter/nf_reject_ipv6.o: In function `nf_send_unreach6':
   nf_reject_ipv6.c:(.text+0x12b2): undefined reference to `nf_ip6_checksum'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 50033 bytes --]

^ permalink raw reply

* [PATCH v3] net: davicom: dm9000: Avoid spinlock recursion during  dm9000_timeout routine
From: Liu Xiang @ 2018-04-14  8:50 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, liuxiang_1999, Liu Xiang

On the DM9000B, dm9000_phy_write() is called after the main spinlock
is held, during the dm9000_timeout() routine. Spinlock recursion
occurs because the main spinlock is requested again in
dm9000_phy_write(). So spinlock should be avoided in phy operation
during the dm9000_timeout() routine.

---
v3:
   When a task enters dm9000_timeout() and gets the main spinlock,
   another task that wants to do asynchronous phy operation must be
   running on another cpu.Because of different cpus, this
   asynchronous task will be blocked in dm9000_phy_write() until
   dm9000_timeout() routine is completed.
---

Signed-off-by: Liu Xiang <liu.xiang6@zte.com.cn>
---
 drivers/net/ethernet/davicom/dm9000.c | 39 +++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/davicom/dm9000.c b/drivers/net/ethernet/davicom/dm9000.c
index 50222b7..56df77d 100644
--- a/drivers/net/ethernet/davicom/dm9000.c
+++ b/drivers/net/ethernet/davicom/dm9000.c
@@ -112,7 +112,7 @@ struct board_info {
 	u8		imr_all;
 
 	unsigned int	flags;
-	unsigned int	in_timeout:1;
+	int		timeout_cpu;
 	unsigned int	in_suspend:1;
 	unsigned int	wake_supported:1;
 
@@ -158,6 +158,17 @@ static inline struct board_info *to_dm9000_board(struct net_device *dev)
 	return netdev_priv(dev);
 }
 
+static bool dm9000_current_in_timeout(struct board_info *db)
+{
+	bool ret = false;
+
+	preempt_disable();
+	ret = (db->timeout_cpu == smp_processor_id());
+	preempt_enable();
+
+	return ret;
+}
+
 /* DM9000 network board routine ---------------------------- */
 
 /*
@@ -276,7 +287,7 @@ static void dm9000_dumpblk_32bit(void __iomem *reg, int count)
  */
 static void dm9000_msleep(struct board_info *db, unsigned int ms)
 {
-	if (db->in_suspend || db->in_timeout)
+	if (db->in_suspend || dm9000_current_in_timeout(db))
 		mdelay(ms);
 	else
 		msleep(ms);
@@ -335,12 +346,13 @@ static void dm9000_msleep(struct board_info *db, unsigned int ms)
 	struct board_info *db = netdev_priv(dev);
 	unsigned long flags;
 	unsigned long reg_save;
+	bool in_timeout = dm9000_current_in_timeout(db);
 
 	dm9000_dbg(db, 5, "phy_write[%02x] = %04x\n", reg, value);
-	if (!db->in_timeout)
+	if (!in_timeout) {
 		mutex_lock(&db->addr_lock);
-
-	spin_lock_irqsave(&db->lock, flags);
+		spin_lock_irqsave(&db->lock, flags);
+	}
 
 	/* Save previous register address */
 	reg_save = readb(db->io_addr);
@@ -356,11 +368,13 @@ static void dm9000_msleep(struct board_info *db, unsigned int ms)
 	iow(db, DM9000_EPCR, EPCR_EPOS | EPCR_ERPRW);
 
 	writeb(reg_save, db->io_addr);
-	spin_unlock_irqrestore(&db->lock, flags);
+	if (!in_timeout)
+		spin_unlock_irqrestore(&db->lock, flags);
 
 	dm9000_msleep(db, 1);		/* Wait write complete */
 
-	spin_lock_irqsave(&db->lock, flags);
+	if (!in_timeout)
+		spin_lock_irqsave(&db->lock, flags);
 	reg_save = readb(db->io_addr);
 
 	iow(db, DM9000_EPCR, 0x0);	/* Clear phyxcer write command */
@@ -368,9 +382,10 @@ static void dm9000_msleep(struct board_info *db, unsigned int ms)
 	/* restore the previous address */
 	writeb(reg_save, db->io_addr);
 
-	spin_unlock_irqrestore(&db->lock, flags);
-	if (!db->in_timeout)
+	if (!in_timeout) {
+		spin_unlock_irqrestore(&db->lock, flags);
 		mutex_unlock(&db->addr_lock);
+	}
 }
 
 /* dm9000_set_io
@@ -980,7 +995,7 @@ static void dm9000_timeout(struct net_device *dev)
 
 	/* Save previous register address */
 	spin_lock_irqsave(&db->lock, flags);
-	db->in_timeout = 1;
+	db->timeout_cpu = smp_processor_id();
 	reg_save = readb(db->io_addr);
 
 	netif_stop_queue(dev);
@@ -992,7 +1007,7 @@ static void dm9000_timeout(struct net_device *dev)
 
 	/* Restore previous register address */
 	writeb(reg_save, db->io_addr);
-	db->in_timeout = 0;
+	db->timeout_cpu = -1;
 	spin_unlock_irqrestore(&db->lock, flags);
 }
 
@@ -1670,6 +1685,8 @@ static struct dm9000_plat_data *dm9000_parse_dt(struct device *dev)
 	db->mii.mdio_read    = dm9000_phy_read;
 	db->mii.mdio_write   = dm9000_phy_write;
 
+	db->timeout_cpu = -1;
+
 	mac_src = "eeprom";
 
 	/* try reading the node address from the attached EEPROM */
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH] netfilter: CONFIG_NF_REJECT_IPV{4,6} becomes bool toggle
From: kbuild test robot @ 2018-04-14  9:45 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: kbuild-all, Arnd Bergmann, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller, netfilter-devel, coreteam, Networking,
	Linux Kernel Mailing List
In-Reply-To: <20180413131558.3jw5dhub5gcyotyt@salvia>

[-- Attachment #1: Type: text/plain, Size: 1559 bytes --]

Hi Pablo,

I love your patch! Yet something to improve:

[auto build test ERROR on nf-next/master]
[also build test ERROR on v4.16 next-20180413]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pablo-Neira-Ayuso/netfilter-CONFIG_NF_REJECT_IPV-4-6-becomes-bool-toggle/20180414-101337
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git master
config: powerpc64-allmodconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=powerpc64 

All error/warnings (new ones prefixed by >>):

   powerpc64-linux-gnu-ld: warning: orphan section `.gnu.hash' from `linker stubs' being placed in section `.gnu.hash'.
   net/ipv6/netfilter/nf_reject_ipv6.o: In function `.nf_reject_ip6_tcphdr_get':
>> (.text+0x1f0): undefined reference to `.nf_ip6_checksum'
   net/ipv6/netfilter/nf_reject_ipv6.o: In function `.nf_send_reset6':
>> (.text+0x794): undefined reference to `.ip6_route_output_flags'
   net/ipv6/netfilter/nf_reject_ipv6.o: In function `.nf_send_unreach6':
   (.text+0xab8): undefined reference to `.nf_ip6_checksum'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 56409 bytes --]

^ permalink raw reply

* [PATCH net-next] net: introduce a new tracepoint for tcp_rcv_space_adjust
From: Yafang Shao @ 2018-04-14  9:48 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, songliubraving; +Cc: netdev, linux-kernel, Yafang Shao

tcp_rcv_space_adjust is called every time data is copied to user space,
introducing a tcp tracepoint for which could show us when the packet is
copied to user.
This could help us figure out whether there's latency in user process.

When a tcp packet arrives, tcp_rcv_established() will be called and with
the existed tracepoint tcp_probe we could get the time when this packet
arrives.
Then this packet will be copied to user, and tcp_rcv_space_adjust will
be called and with this new introduced tracepoint we could get the time
when this packet is copied to user.

	arrives time : user process time    => latency caused by user
	tcp_probe      tcp_rcv_space_adjust

Hence in the prink message, sk is printed as a key to connect these two
tracepoints.

Maybe we could export sockfd in this new tracepoint as well, then we
could connect this new tracepoint with epoll/read/recv* tracepoint, and
finally that could show us the whole lifespan of this packet. But we
could also implement that with pid as these functions are executed in
process context.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/trace/events/tcp.h | 21 +++++++++++++++------
 net/ipv4/tcp_input.c       |  2 ++
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 878b2be..65a6d22 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -146,10 +146,11 @@
 			       sk->sk_v6_rcv_saddr, sk->sk_v6_daddr);
 	),
 
-	TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c",
+	TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c sock=0x%p",
 		  __entry->sport, __entry->dport,
 		  __entry->saddr, __entry->daddr,
-		  __entry->saddr_v6, __entry->daddr_v6)
+		  __entry->saddr_v6, __entry->daddr_v6,
+		  __entry->skaddr)
 );
 
 DEFINE_EVENT(tcp_event_sk, tcp_receive_reset,
@@ -166,6 +167,13 @@
 	TP_ARGS(sk)
 );
 
+DEFINE_EVENT(tcp_event_sk, tcp_rcv_space_adjust,
+
+	TP_PROTO(const struct sock *sk),
+
+	TP_ARGS(sk)
+);
+
 TRACE_EVENT(tcp_set_state,
 
 	TP_PROTO(const struct sock *sk, const int oldstate, const int newstate),
@@ -265,6 +273,7 @@
 	TP_ARGS(sk, skb),
 
 	TP_STRUCT__entry(
+		__field(const void *, skaddr)
 		/* sockaddr_in6 is always bigger than sockaddr_in */
 		__array(__u8, saddr, sizeof(struct sockaddr_in6))
 		__array(__u8, daddr, sizeof(struct sockaddr_in6))
@@ -285,6 +294,8 @@
 		const struct tcp_sock *tp = tcp_sk(sk);
 		const struct inet_sock *inet = inet_sk(sk);
 
+		__entry->skaddr = sk;
+
 		memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
 		memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
 
@@ -305,13 +316,11 @@
 		__entry->srtt = tp->srtt_us >> 3;
 	),
 
-	TP_printk("src=%pISpc dest=%pISpc mark=%#x length=%d snd_nxt=%#x "
-		  "snd_una=%#x snd_cwnd=%u ssthresh=%u snd_wnd=%u srtt=%u "
-		  "rcv_wnd=%u",
+	TP_printk("src=%pISpc dest=%pISpc mark=%#x length=%d snd_nxt=%#x snd_una=%#x snd_cwnd=%u ssthresh=%u snd_wnd=%u srtt=%u rcv_wnd=%u sock=0x%p",
 		  __entry->saddr, __entry->daddr, __entry->mark,
 		  __entry->length, __entry->snd_nxt, __entry->snd_una,
 		  __entry->snd_cwnd, __entry->ssthresh, __entry->snd_wnd,
-		  __entry->srtt, __entry->rcv_wnd)
+		  __entry->srtt, __entry->rcv_wnd, __entry->skaddr)
 );
 
 #endif /* _TRACE_TCP_H */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 367def6..4b4d6b9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -582,6 +582,8 @@ void tcp_rcv_space_adjust(struct sock *sk)
 	u32 copied;
 	int time;
 
+	trace_tcp_rcv_space_adjust(sk);
+
 	tcp_mstamp_refresh(tp);
 	time = tcp_stamp_us_delta(tp->tcp_mstamp, tp->rcvq_space.time);
 	if (time < (tp->rcv_rtt_est.rtt_us >> 3) || tp->rcv_rtt_est.rtt_us == 0)
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] x86/cpufeature: guard asm_volatile_goto usage with CC_HAVE_ASM_GOTO
From: Peter Zijlstra @ 2018-04-14 10:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yonghong Song, mingo, daniel, linux-kernel, x86, kernel-team,
	Thomas Gleixner, netdev, Jesper Dangaard Brouer
In-Reply-To: <a65f6542-8754-ea84-d1bf-076349b6b288@fb.com>

On Fri, Apr 13, 2018 at 01:42:14PM -0700, Alexei Starovoitov wrote:
> On 4/13/18 11:19 AM, Peter Zijlstra wrote:
> > On Tue, Apr 10, 2018 at 02:28:04PM -0700, Alexei Starovoitov wrote:
> > > Instead of
> > > #ifdef CC_HAVE_ASM_GOTO
> > > we can replace it with
> > > #ifndef __BPF__
> > > or some other name,
> > 
> > I would prefer the BPF specific hack; otherwise we might be encouraging
> > people to build the kernel proper without asm-goto.
> > 
> 
> I don't understand this concern.

The thing is; this will be a (temporary) BPF specific hack. Hiding it
behind something that looks 'normal' (CC_HAVE_ASM_GOTO) is just not
right.

^ permalink raw reply

* Re: [RFC v2] virtio: support packed ring
From: Tiwei Bie @ 2018-04-14 11:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, wexu, virtualization, linux-kernel, netdev, jfreimann
In-Reply-To: <20180413181808-mutt-send-email-mst@kernel.org>

On Fri, Apr 13, 2018 at 06:22:45PM +0300, Michael S. Tsirkin wrote:
> On Sun, Apr 01, 2018 at 10:12:16PM +0800, Tiwei Bie wrote:
> > +static inline bool more_used(const struct vring_virtqueue *vq)
> > +{
> > +	return vq->packed ? more_used_packed(vq) : more_used_split(vq);
> > +}
> > +
> > +void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, unsigned int *len,
> > +				  void **ctx)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +	void *ret;
> > +	unsigned int i;
> > +	u16 last_used;
> > +
> > +	START_USE(vq);
> > +
> > +	if (unlikely(vq->broken)) {
> > +		END_USE(vq);
> > +		return NULL;
> > +	}
> > +
> > +	if (!more_used(vq)) {
> > +		pr_debug("No more buffers in queue\n");
> > +		END_USE(vq);
> > +		return NULL;
> > +	}
> 
> So virtqueue_get_buf_ctx_split should only call more_used_split.

Yeah, you're right! Will fix this in the next version.

> 
> to avoid such issues I think we should lay out the code like this:
> 
> XXX_split
> 
> XXX_packed
> 
> XXX wrappers

I'll do it. Thanks for the suggestion!

> 
> > +/* The standard layout
> 
> I'd drop standard here.

Got it. I'll drop the word "standard".

> 
> > for the packed ring is a continuous chunk of memory
> > + * which looks like this.
> > + *
> > + * struct vring_packed
> > + * {
> 
> Can the opening bracket go on the prev line pls?

Sure.

> 
> > + *	// The actual descriptors (16 bytes each)
> > + *	struct vring_packed_desc desc[num];
> > + *
> > + *	// Padding to the next align boundary.
> > + *	char pad[];
> > + *
> > + *	// Driver Event Suppression
> > + *	struct vring_packed_desc_event driver;
> > + *
> > + *	// Device Event Suppression
> > + *	struct vring_packed_desc_event device;
> 
> Maybe that's how our driver does it but it's not based on spec
> so I don't think this belongs in the header.

I will move it to the place where vring_packed_init()
is defined.

> 
> > + * };
> > + */
> > +
> > +static inline unsigned vring_packed_size(unsigned int num, unsigned long align)
> > +{
> > +	return ((sizeof(struct vring_packed_desc) * num + align - 1)
> > +		& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
> > +}
> > +
> 
> Cant say this API makes sense for me.

Hmm, do you have any suggestion? Also move it out of this header?

Thanks for the review! :)

Best regards,
Tiwei Bie

> 
> 
> >  #endif /* _UAPI_LINUX_VIRTIO_RING_H */
> > -- 
> > 2.11.0

^ permalink raw reply

* Re: Donation
From: M. M. Fridman @ 2018-04-14  4:04 UTC (permalink / raw)




-- 
I Mikhail Fridman. has selected you specially as one of my beneficiaries
for my Charitable Donation, Just as I have declared on May 23, 2016 to 
give
my fortune as charity.

Check the link below for confirmation:

http://www.ibtimes.co.uk/russias-second-wealthiest-man-mikhail-fridman-plans-leaving-14-2bn-fortune-charity-1561604

Reply as soon as possible with further directives.

Best Regards,
Mikhail Fridman.

^ permalink raw reply

* Re: tg3 crashes under high load, when using 100Mbits
From: Kai-Heng Feng @ 2018-04-14 15:47 UTC (permalink / raw)
  To: Satish Baddipadige
  Cc: Siva Reddy Kallam, Prashant Sreedharan, Michael Chan,
	Linux Netdev List, Linux Kernel Mailing List, Stanley Hsiao,
	Tim Chen
In-Reply-To: <48279E27-4FCB-4A3F-8F4A-E26581020D2A@canonical.com>

Hi Satish,

> On 2018Mar21, at 00:57, Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
> 
> Satish Baddipadige <satish.baddipadige@broadcom.com> wrote:
> 
>> On Thu, Feb 15, 2018 at 7:37 PM, Siva Reddy Kallam
>> <siva.kallam@broadcom.com> wrote:
>>> On Mon, Feb 12, 2018 at 10:59 AM, Siva Reddy Kallam
>>> <siva.kallam@broadcom.com> wrote:
>>>> On Fri, Feb 9, 2018 at 10:41 AM, Kai Heng Feng
>>>> <kai.heng.feng@canonical.com> wrote:
>>>>> Hi Broadcom folks,
>>>>> 
>>>>> We are now enabling a new platform with tg3 nic, unfortunately we observed
>>>>> the bug [1] that dated back to 2015.
>>>>> I tried commit 4419bb1cedcd ("tg3: Add workaround to restrict 5762 MRRS to
>>>>> 2048”) but it does’t work.
>>>>> 
>>>>> Do you have any idea how to solve the issue?
>>>>> 
>>>>> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664
>>>>> 
>>>>> Kai-Heng
>>>> Thank you for reporting. We will check and update you.
>>> With link aware mode, the clock speed could be slow and boot code does not
>>> complete within the expected time with lower link speeds. Need to override
>>> and the clock in driver. We are checking the feasibility of adding
>>> this in driver or firmware.
>> 
>> Hi Kai-Heng,
>> 
>> Can you please test the attached patch?
> 
> I built a kernel and asked affected users to try.

Users reported that the crash still happens with the patch.

Kai-Heng

> 
> Thanks for your work.
> 
> Kai-Heng
> 
>> 
>> Thanks,
>> Satish
>> <tg3_5762_clock_override.patch>

^ permalink raw reply

* Re: [PATCH iproute2-next 3/3] treewide: Use addattr_nest()/addattr_nest_end() to handle nested attributes
From: Stephen Hemminger @ 2018-04-14 16:25 UTC (permalink / raw)
  To: Vinicius Costa Gomes; +Cc: Serhey Popovych, netdev
In-Reply-To: <87vacuu332.fsf@intel.com>

On Fri, 13 Apr 2018 15:57:37 -0700
Vinicius Costa Gomes <vinicius.gomes@intel.com> wrote:

> Hi,
> 
> Serhey Popovych <serhe.popovych@gmail.com> writes:
> 
> [...]
> 
> > diff --git a/tc/q_mqprio.c b/tc/q_mqprio.c
> > index 89b4600..207d644 100644
> > --- a/tc/q_mqprio.c
> > +++ b/tc/q_mqprio.c
> > @@ -173,8 +173,7 @@ static int mqprio_parse_opt(struct qdisc_util *qu, int argc,
> >  		argc--; argv++;
> >  	}
> >  
> > -	tail = NLMSG_TAIL(n);
> > -	addattr_l(n, 1024, TCA_OPTIONS, &opt, sizeof(opt));
> > +	tail = addattr_nest_compat(n, 1024, TCA_OPTIONS, &opt, sizeof(opt));
> >  
> >  	if (flags & TC_MQPRIO_F_MODE)
> >  		addattr_l(n, 1024, TCA_MQPRIO_MODE,
> > @@ -209,7 +208,7 @@ static int mqprio_parse_opt(struct qdisc_util *qu, int argc,
> >  		addattr_nest_end(n, start);
> >  	}
> >  
> > -	tail->rta_len = (void *)NLMSG_TAIL(n) - (void *)tail;
> > +	addattr_nest_compat_end(n, tail);
> >  
> >  	return 0;
> >  }  
> 
> Sorry if I am too late, but this breaks mqprio, i.e. something like
> this:
> 
> $ tc qdisc replace dev enp2s0 handle 100: parent root mqprio \
>                    num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
>                    queues 1@0 1@1 2@2 hw 0
> 
> that used to work, now doesn't.
> 
> This patch looks right, so I thought that it could be possible that mqprio
> (in the kernel side) was making some wrong assumptions about the format
> of the messages.
> 
> And after some investigation, what seems to be happening is something
> like this (not too familiar with netlink protocol internals, I may be
> missing something).
> 
> In the "wire", after this patch, the mqprio part of message may be
> represented as:
> 
> /* The message format is [ len | type | payload ] */
> 
> [ S | 2 | <S bytes> ]
> [ 0 | 2 | ]
> 
> Some notes:
>  - S is the aligned value of sizeof(opt);
>  - The value of TCA_OPTIONS is 2;
> 
> Before this patch, I think it was something like:
> 
> [ S | 2 | <S bytes> ]
> 
> The problem is that mqprio defines an internal type with the same value
> as TCA_OPTIONS (2), and that finalizing (empty) is interpreted as the
> "internal" field instead of indicating the end of TCA_OPTIONS, which
> causes a size mismatch with 'mqprio_policy', causing the command to
> create a mqprio qdisc to fail.
> 
> In short, I think that replacing the "open coded" version with
> addattr_nest_compat() is not a functionally equivalent change.
> 
> 
> Cheers,
> --
> Vinicius

There are also a couple of legacy cases where kernel expects or sends
nested netlink messages without the NLA_NESTED flag. I ran into this several
years ago, forgot where.

^ permalink raw reply

* Regression with 5dcd8400884c ("macsec: missing dev_put() on error in macsec_newlink()")
From: Laura Abbott @ 2018-04-14 17:56 UTC (permalink / raw)
  To: Dan Carpenter, David S. Miller; +Cc: Linux Kernel Mailing List, netdev

[-- Attachment #1: Type: text/plain, Size: 819 bytes --]

Hi,

Fedora got a bug report of a regression when trying to remove the
the macsec module (https://bugzilla.redhat.com/show_bug.cgi?id=1566410).
I did a bisect and found

commit 5dcd8400884cc4a043a6d4617e042489e5d566a9
Author: Dan Carpenter <dan.carpenter@oracle.com>
Date:   Wed Mar 21 11:09:01 2018 +0300

     macsec: missing dev_put() on error in macsec_newlink()

     We moved the dev_hold(real_dev); call earlier in the function but forgot
     to update the error paths.

     Fixes: 0759e552bce7 ("macsec: fix negative refcnt on parent link")
     Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
     Signed-off-by: David S. Miller <davem@davemloft.net>

The script I used for testing based on the reporter is attached. It
looks like modprobe is stuck in the D state. Any idea?

Thanks,
Laura

[-- Attachment #2: mac-sec-setup.sh --]
[-- Type: application/x-shellscript, Size: 2123 bytes --]

^ permalink raw reply

* Re: XDP performance regression due to CONFIG_RETPOLINE Spectre V2
From: David Woodhouse @ 2018-04-14 19:29 UTC (permalink / raw)
  To: Christoph Hellwig, Tushar Dave
  Cc: Jesper Dangaard Brouer, xdp-newbies@vger.kernel.org,
	netdev@vger.kernel.org, William Tu, Björn Töpel,
	Karlsson, Magnus, Alexander Duyck, Arnaldo Carvalho de Melo
In-Reply-To: <20180413172611.GA23634@lst.de>

[-- Attachment #1: Type: text/plain, Size: 1001 bytes --]



On Fri, 2018-04-13 at 19:26 +0200, Christoph Hellwig wrote:
> On Fri, Apr 13, 2018 at 10:12:41AM -0700, Tushar Dave wrote:
> > I guess there is nothing we need to do!
> >
> > On x86, in case of no intel iommu or iommu is disabled, you end up in
> > swiotlb for DMA API calls when system has 4G memory.
> > However, AFAICT, for 64bit DMA capable devices swiotlb DMA APIs do not
> > use bounce buffer until and unless you have swiotlb=force specified in
> > kernel commandline.
> 
> Sure.  But that means very sync_*_to_device and sync_*_to_cpu now
> involves an indirect call to do exactly nothing, which in the workload
> Jesper is looking at is causing a huge performance degradation due to
> retpolines.

We should look at using the

 if (dma_ops == swiotlb_dma_ops)
    swiotlb_map_page()
 else
    dma_ops->map_page()

trick for this. Perhaps with alternatives so that when an Intel or AMD
IOMMU is detected, it's *that* which is checked for as the special
case.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply

* Re: [PATCH v2 net 0/3] sfc: ARFS fixes
From: David Miller @ 2018-04-14 19:40 UTC (permalink / raw)
  To: ecree; +Cc: linux-net-drivers, netdev
In-Reply-To: <878265b2-a42a-d49e-0e68-0bbcabbabeaa@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Fri, 13 Apr 2018 19:16:20 +0100

> Three issues introduced by my recent asynchronous filter handling changes:
> 1. The old filter_rfs_insert would replace a matching filter of equal
>    priority; we need to pass the appropriate argument to filter_insert to
>    make it do the same.
> 2. We're lying to the kernel with our return value from ndo_rx_flow_steer,
>    so we need to lie consistently when calling rps_may_expire_flow.  This
>    is only a partial fix, as the lie still prevents us from steering
>    multiple flows with the same ID to different queues; a proper fix that
>    stops us lying at all will hopefully follow later.
> 3. It's possible to cause the kernel to hammer ndo_rx_flow_steer very
>    hard, so make sure we don't build up too huge a backlog of workitems.
> 
> Possibly it would be better to fix #3 on the kernel side; I have a patch
>  which I think does that but it's not a regression in 4.17 so isn't 'net'
>  material.
> There's also the issue that we come up in the bad configuration that
>  triggers #3 by default, but that too is a problem for another time.

Series applied, thanks Edward.

^ permalink raw reply

* Re: [PATCH net-next] net: introduce a new tracepoint for tcp_rcv_space_adjust
From: David Miller @ 2018-04-14 19:47 UTC (permalink / raw)
  To: laoar.shao; +Cc: kuznet, yoshfuji, songliubraving, netdev, linux-kernel
In-Reply-To: <1523699303-15699-1-git-send-email-laoar.shao@gmail.com>


The net-next tree is closed, please resubmit this when the merge window
ends and the net-next tree opens back up.

Thank you.

^ permalink raw reply

* Re: [PATCH] x86/cpufeature: guard asm_volatile_goto usage with CC_HAVE_ASM_GOTO
From: Yonghong Song @ 2018-04-14 20:30 UTC (permalink / raw)
  To: Peter Zijlstra, Alexei Starovoitov
  Cc: mingo, daniel, linux-kernel, x86, kernel-team, Thomas Gleixner,
	netdev, Jesper Dangaard Brouer
In-Reply-To: <20180414101112.GX4064@hirez.programming.kicks-ass.net>



On 4/14/18 3:11 AM, Peter Zijlstra wrote:
> On Fri, Apr 13, 2018 at 01:42:14PM -0700, Alexei Starovoitov wrote:
>> On 4/13/18 11:19 AM, Peter Zijlstra wrote:
>>> On Tue, Apr 10, 2018 at 02:28:04PM -0700, Alexei Starovoitov wrote:
>>>> Instead of
>>>> #ifdef CC_HAVE_ASM_GOTO
>>>> we can replace it with
>>>> #ifndef __BPF__
>>>> or some other name,
>>>
>>> I would prefer the BPF specific hack; otherwise we might be encouraging
>>> people to build the kernel proper without asm-goto.
>>>
>>
>> I don't understand this concern.
> 
> The thing is; this will be a (temporary) BPF specific hack. Hiding it
> behind something that looks 'normal' (CC_HAVE_ASM_GOTO) is just not
> right.

This is a fair concern. I will use a different macro and send v2 soon.
Thanks.

^ permalink raw reply

* [PATCH 0/3] Receive Side Coalescing for macb driver
From: Rafal Ozieblo @ 2018-04-14 20:53 UTC (permalink / raw)
  To: Nicolas Ferre, netdev, linux-kernel; +Cc: Rafal Ozieblo

This patch series adds support for receive side coalescing
for Cadence GEM driver. Receive segmentation coalescing
is a mechanism to reduce CPU overhead. This is done by
coalescing received TCP message segments together into
a single large message. This means that when the message
is complete the CPU only has to process the single header
and act upon the one data payload.

Rafal Ozieblo (3):
  net: macb: Add support for rsc capable hardware
  net: macb: Add support for header data spliting
  net: macb: Receive Side Coalescing (RSC) feature added.

 drivers/net/ethernet/cadence/macb.h      |  21 +++
 drivers/net/ethernet/cadence/macb_main.c | 227 ++++++++++++++++++++++++++-----
 2 files changed, 212 insertions(+), 36 deletions(-)

-- 
2.4.5

^ permalink raw reply

* [PATCH 1/3] net: macb: Add support for rsc capable hardware
From: Rafal Ozieblo @ 2018-04-14 20:53 UTC (permalink / raw)
  To: Nicolas Ferre, netdev, linux-kernel; +Cc: Rafal Ozieblo
In-Reply-To: <1523739187-20077-1-git-send-email-rafalo@cadence.com>

When the pbuf_rsc has been enabled in hardware
the receive buffer offset for incoming packets
cannot be changed in the network configuration register
(even when rsc is not use at all).

Signed-off-by: Rafal Ozieblo <rafalo@cadence.com>
---
 drivers/net/ethernet/cadence/macb.h      |  2 ++
 drivers/net/ethernet/cadence/macb_main.c | 22 ++++++++++++++++++----
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 8665982..33c9a48 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -477,6 +477,8 @@
 /* Bitfields in DCFG6. */
 #define GEM_PBUF_LSO_OFFSET			27
 #define GEM_PBUF_LSO_SIZE			1
+#define GEM_PBUF_RSC_OFFSET			26
+#define GEM_PBUF_RSC_SIZE			1
 #define GEM_DAW64_OFFSET			23
 #define GEM_DAW64_SIZE				1
 
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index b4c9268..43201a8 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -930,8 +930,9 @@ static void gem_rx_refill(struct macb_queue *queue)
 			macb_set_addr(bp, desc, paddr);
 			desc->ctrl = 0;
 
-			/* properly align Ethernet header */
-			skb_reserve(skb, NET_IP_ALIGN);
+			if (!(bp->dev->hw_features & NETIF_F_LRO))
+				/* properly align Ethernet header */
+				skb_reserve(skb, NET_IP_ALIGN);
 		} else {
 			desc->addr &= ~MACB_BIT(RX_USED);
 			desc->ctrl = 0;
@@ -2110,7 +2111,13 @@ static void macb_init_hw(struct macb *bp)
 	config = macb_mdc_clk_div(bp);
 	if (bp->phy_interface == PHY_INTERFACE_MODE_SGMII)
 		config |= GEM_BIT(SGMIIEN) | GEM_BIT(PCSSEL);
-	config |= MACB_BF(RBOF, NET_IP_ALIGN);	/* Make eth data aligned */
+	/* When the pbuf_rsc has been enabled in hardware the receive buffer
+	 * offset cannot be changed in the network configuration register.
+	 */
+	if (!(bp->dev->hw_features &  NETIF_F_LRO))
+		/* Make eth data aligned */
+		config |= MACB_BF(RBOF, NET_IP_ALIGN);
+
 	config |= MACB_BIT(PAE);		/* PAuse Enable */
 	config |= MACB_BIT(DRFCS);		/* Discard Rx FCS */
 	if (bp->caps & MACB_CAPS_JUMBO)
@@ -2281,7 +2288,7 @@ static void macb_set_rx_mode(struct net_device *dev)
 static int macb_open(struct net_device *dev)
 {
 	struct macb *bp = netdev_priv(dev);
-	size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN + NET_IP_ALIGN;
+	size_t bufsz = dev->mtu + ETH_HLEN + ETH_FCS_LEN;
 	struct macb_queue *queue;
 	unsigned int q;
 	int err;
@@ -2295,6 +2302,9 @@ static int macb_open(struct net_device *dev)
 	if (!dev->phydev)
 		return -EAGAIN;
 
+	if (!(bp->dev->hw_features & NETIF_F_LRO))
+		bufsz += NET_IP_ALIGN;
+
 	/* RX buffers initialization */
 	macb_init_rx_buffer_size(bp, bufsz);
 
@@ -3365,6 +3375,10 @@ static int macb_init(struct platform_device *pdev)
 	if (GEM_BFEXT(PBUF_LSO, gem_readl(bp, DCFG6)))
 		dev->hw_features |= MACB_NETIF_LSO;
 
+	/* Check RSC capability */
+	if (GEM_BFEXT(PBUF_RSC, gem_readl(bp, DCFG6)))
+		dev->hw_features |= NETIF_F_LRO;
+
 	/* Checksum offload is only available on gem with packet buffer */
 	if (macb_is_gem(bp) && !(bp->caps & MACB_CAPS_FIFO_MODE))
 		dev->hw_features |= NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
-- 
2.4.5

^ permalink raw reply related

* [PATCH 2/3] net: macb: Add support for header data spliting
From: Rafal Ozieblo @ 2018-04-14 20:54 UTC (permalink / raw)
  To: Nicolas Ferre, netdev, linux-kernel; +Cc: Rafal Ozieblo
In-Reply-To: <1523739187-20077-1-git-send-email-rafalo@cadence.com>

This patch adds support for frames splited between
many rx buffers. Header data spliting can be used
but also buffers shorter than max frame length.
The only limitation is that frame header can't
be splited.

Signed-off-by: Rafal Ozieblo <rafalo@cadence.com>
---
 drivers/net/ethernet/cadence/macb.h      |  13 +++
 drivers/net/ethernet/cadence/macb_main.c | 137 +++++++++++++++++++++++--------
 2 files changed, 118 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 33c9a48..a2cb805 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -295,6 +295,8 @@
 /* Bitfields in DMACFG. */
 #define GEM_FBLDO_OFFSET	0 /* fixed burst length for DMA */
 #define GEM_FBLDO_SIZE		5
+#define GEM_HDRS_OFFSET		5 /* Header Data Splitting */
+#define GEM_HDRS_SIZE		1
 #define GEM_ENDIA_DESC_OFFSET	6 /* endian swap mode for management descriptor access */
 #define GEM_ENDIA_DESC_SIZE	1
 #define GEM_ENDIA_PKT_OFFSET	7 /* endian swap mode for packet data access */
@@ -755,8 +757,12 @@ struct gem_tx_ts {
 #define MACB_RX_SOF_SIZE			1
 #define MACB_RX_EOF_OFFSET			15
 #define MACB_RX_EOF_SIZE			1
+#define MACB_RX_HDR_OFFSET			16
+#define MACB_RX_HDR_SIZE			1
 #define MACB_RX_CFI_OFFSET			16
 #define MACB_RX_CFI_SIZE			1
+#define MACB_RX_EOH_OFFSET			17
+#define MACB_RX_EOH_SIZE			1
 #define MACB_RX_VLAN_PRI_OFFSET			17
 #define MACB_RX_VLAN_PRI_SIZE			3
 #define MACB_RX_PRI_TAG_OFFSET			20
@@ -1086,6 +1092,11 @@ struct tsu_incr {
 	u32 ns;
 };
 
+struct rx_frag_list {
+	struct sk_buff		*skb_head;
+	struct sk_buff		*skb_tail;
+};
+
 struct macb_queue {
 	struct macb		*bp;
 	int			irq;
@@ -1121,6 +1132,8 @@ struct macb_queue {
 	unsigned int		tx_ts_head, tx_ts_tail;
 	struct gem_tx_ts	tx_timestamps[PTP_TS_BUFFER_SIZE];
 #endif
+	struct rx_frag_list	rx_frag;
+	u32			rx_frag_len;
 };
 
 struct ethtool_rx_fs_item {
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 43201a8..27c406c 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -967,6 +967,13 @@ static void discard_partial_frame(struct macb_queue *queue, unsigned int begin,
 	 */
 }
 
+void gem_reset_rx_state(struct macb_queue *queue)
+{
+	queue->rx_frag.skb_head = NULL;
+	queue->rx_frag.skb_tail = NULL;
+	queue->rx_frag_len = 0;
+}
+
 static int gem_rx(struct macb_queue *queue, int budget)
 {
 	struct macb *bp = queue->bp;
@@ -977,6 +984,9 @@ static int gem_rx(struct macb_queue *queue, int budget)
 	int			count = 0;
 
 	while (count < budget) {
+		struct sk_buff *skb_head, *skb_tail;
+		bool eoh = false, header = false;
+		bool sof, eof;
 		u32 ctrl;
 		dma_addr_t addr;
 		bool rxused;
@@ -995,57 +1005,118 @@ static int gem_rx(struct macb_queue *queue, int budget)
 			break;
 
 		queue->rx_tail++;
-		count++;
-
-		if (!(ctrl & MACB_BIT(RX_SOF) && ctrl & MACB_BIT(RX_EOF))) {
+		skb = queue->rx_skbuff[entry];
+		if (unlikely(!skb)) {
 			netdev_err(bp->dev,
-				   "not whole frame pointed by descriptor\n");
+				   "inconsistent Rx descriptor chain\n");
 			bp->dev->stats.rx_dropped++;
 			queue->stats.rx_dropped++;
 			break;
 		}
-		skb = queue->rx_skbuff[entry];
-		if (unlikely(!skb)) {
+		skb_head = queue->rx_frag.skb_head;
+		skb_tail = queue->rx_frag.skb_tail;
+		sof = !!(ctrl & MACB_BIT(RX_SOF));
+		eof = !!(ctrl & MACB_BIT(RX_EOF));
+		if (GEM_BFEXT(HDRS, gem_readl(bp, DMACFG))) {
+			eoh = !!(ctrl & MACB_BIT(RX_EOH));
+			if (!eof)
+				header = !!(ctrl & MACB_BIT(RX_HDR));
+		}
+
+		queue->rx_skbuff[entry] = NULL;
+		/* Discard if out-of-sequence or header split across buffers */
+		if ((!skb_head /* first frame buffer */
+		&& (!sof /* without start of frame */
+		|| (header && !eoh))) /* or without whole header */
+		|| (skb_head && sof)) { /* or new start before EOF */
+			struct sk_buff *tmp_skb;
+
 			netdev_err(bp->dev,
-				   "inconsistent Rx descriptor chain\n");
+				   "Incomplete frame received! (skb_head=%p sof=%u hdr=%u eoh=%u)\n",
+				   skb_head, (u32)sof, (u32)header, (u32)eoh);
+			dev_kfree_skb(skb);
+			if (skb_head) {
+				skb = skb_shinfo(skb_head)->frag_list;
+				dev_kfree_skb(skb_head);
+				while (skb) {
+					tmp_skb = skb;
+					skb = skb->next;
+					dev_kfree_skb(tmp_skb);
+				}
+			}
 			bp->dev->stats.rx_dropped++;
 			queue->stats.rx_dropped++;
+			gem_reset_rx_state(queue);
 			break;
 		}
+
 		/* now everything is ready for receiving packet */
-		queue->rx_skbuff[entry] = NULL;
 		len = ctrl & bp->rx_frm_len_mask;
 
+		/* Buffer lengths in the descriptor:
+		 * eoh: len = header size,
+		 * eof: len = frame size (including header),
+		 * else: len = 0, length equals bp->rx_buffer_size
+		 */
+		if (!len)
+			len = bp->rx_buffer_size;
+		else
+			/* If EOF or EOH reduce the size of the packet
+			 * by already received bytes
+			 */
+			len -= queue->rx_frag_len;
+
 		netdev_vdbg(bp->dev, "gem_rx %u (len %u)\n", entry, len);
 
+		gem_ptp_do_rxstamp(bp, skb, desc);
+
 		skb_put(skb, len);
 		dma_unmap_single(&bp->pdev->dev, addr,
 				 bp->rx_buffer_size, DMA_FROM_DEVICE);
 
-		skb->protocol = eth_type_trans(skb, bp->dev);
-		skb_checksum_none_assert(skb);
-		if (bp->dev->features & NETIF_F_RXCSUM &&
-		    !(bp->dev->flags & IFF_PROMISC) &&
-		    GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
-			skb->ip_summed = CHECKSUM_UNNECESSARY;
-
-		bp->dev->stats.rx_packets++;
-		queue->stats.rx_packets++;
-		bp->dev->stats.rx_bytes += skb->len;
-		queue->stats.rx_bytes += skb->len;
-
-		gem_ptp_do_rxstamp(bp, skb, desc);
-
-#if defined(DEBUG) && defined(VERBOSE_DEBUG)
-		netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
-			    skb->len, skb->csum);
-		print_hex_dump(KERN_DEBUG, " mac: ", DUMP_PREFIX_ADDRESS, 16, 1,
-			       skb_mac_header(skb), 16, true);
-		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
-			       skb->data, 32, true);
-#endif
-
-		netif_receive_skb(skb);
+		if (!skb_head) {
+			/* first buffer in frame */
+			skb->protocol = eth_type_trans(skb, bp->dev);
+			skb_checksum_none_assert(skb);
+			if (bp->dev->features & NETIF_F_RXCSUM &&
+			    !(bp->dev->flags & IFF_PROMISC) &&
+			    GEM_BFEXT(RX_CSUM, ctrl) & GEM_RX_CSUM_CHECKED_MASK)
+				skb->ip_summed = CHECKSUM_UNNECESSARY;
+			queue->rx_frag.skb_head = skb;
+			queue->rx_frag.skb_tail = skb;
+			skb_head = skb;
+		} else {
+			/* not first buffer in frame */
+			if (!skb_shinfo(skb_head)->frag_list)
+				skb_shinfo(skb_head)->frag_list = skb;
+			else
+				skb_tail->next = skb;
+			queue->rx_frag.skb_tail = skb;
+			skb_head->len += len;
+			skb_head->data_len += len;
+			skb_head->truesize += len;
+		}
+		if (eof) {
+			bp->dev->stats.rx_packets++;
+			queue->stats.rx_packets++;
+			bp->dev->stats.rx_bytes += skb->len;
+			queue->stats.rx_bytes += skb->len;
+
+	#if defined(DEBUG) && defined(VERBOSE_DEBUG)
+			netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
+				    skb->len, skb->csum);
+			print_hex_dump(KERN_DEBUG, " mac: ", DUMP_PREFIX_ADDRESS, 16, 1,
+				       skb_mac_header(skb), 16, true);
+			print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
+				       skb->data, 32, true);
+	#endif
+
+			netif_receive_skb(skb_head);
+			gem_reset_rx_state(queue);
+			count++;
+		} else {
+			queue->rx_frag_len += len;
+		}
 	}
 
 	gem_rx_refill(queue);
@@ -1905,6 +1976,8 @@ static int macb_alloc_consistent(struct macb *bp)
 		netdev_dbg(bp->dev,
 			   "Allocated RX ring of %d bytes at %08lx (mapped %p)\n",
 			   size, (unsigned long)queue->rx_ring_dma, queue->rx_ring);
+
+		gem_reset_rx_state(queue);
 	}
 	if (bp->macbgem_ops.mog_alloc_rx_buffers(bp))
 		goto out_err;
-- 
2.4.5

^ permalink raw reply related

* [PATCH 3/3] net: macb: Receive Side Coalescing (RSC) feature added.
From: Rafal Ozieblo @ 2018-04-14 20:55 UTC (permalink / raw)
  To: Nicolas Ferre, netdev, linux-kernel; +Cc: Rafal Ozieblo
In-Reply-To: <1523739187-20077-1-git-send-email-rafalo@cadence.com>

This is basically the same as Large Receive Offload (LRO)
in Linux framework.

Signed-off-by: Rafal Ozieblo <rafalo@cadence.com>
---
 drivers/net/ethernet/cadence/macb.h      |  6 +++
 drivers/net/ethernet/cadence/macb_main.c | 70 +++++++++++++++++++++++++++++++-
 2 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index a2cb805..9ebdde7 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -83,6 +83,7 @@
 #define GEM_USRIO		0x000c /* User IO */
 #define GEM_DMACFG		0x0010 /* DMA Configuration */
 #define GEM_JML			0x0048 /* Jumbo Max Length */
+#define GEM_RSC			0x0058 /* RSC Control */
 #define GEM_HRB			0x0080 /* Hash Bottom */
 #define GEM_HRT			0x0084 /* Hash Top */
 #define GEM_SA1B		0x0088 /* Specific1 Bottom */
@@ -318,6 +319,11 @@
 #define GEM_ADDR64_OFFSET	30 /* Address bus width - 64b or 32b */
 #define GEM_ADDR64_SIZE		1
 
+/* Bitfields in RSC control */
+#define GEM_RSCCTRL_OFFSET	1 /* RSC control */
+#define GEM_RSCCTRL_SIZE	15
+#define GEM_CLRMSK_OFFSET	16 /* RSC clear mask */
+#define GEM_CLRMSK_SIZE		1
 
 /* Bitfields in NSR */
 #define MACB_NSR_LINK_OFFSET	0 /* pcs_link_state */
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 27c406c..92bdcf1 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2377,6 +2377,8 @@ static int macb_open(struct net_device *dev)
 
 	if (!(bp->dev->hw_features & NETIF_F_LRO))
 		bufsz += NET_IP_ALIGN;
+	else
+		bufsz = 0xFF * 64; // For RSC Buffer Sizes must be set to 16K.
 
 	/* RX buffers initialization */
 	macb_init_rx_buffer_size(bp, bufsz);
@@ -2801,6 +2803,62 @@ static int macb_get_ts_info(struct net_device *netdev,
 	return ethtool_op_get_ts_info(netdev, info);
 }
 
+static void gem_enable_hdr_data_split(struct macb *bp, bool enable)
+{
+	u32 dmacfg;
+
+	dmacfg = gem_readl(bp, DMACFG);
+	if (enable)
+		dmacfg |= GEM_BIT(HDRS);
+	else
+		dmacfg &= ~GEM_BIT(HDRS);
+	gem_writel(bp, DMACFG, dmacfg);
+}
+
+static void gem_update_rsc_state(struct macb *bp, netdev_features_t feature)
+{
+	u32 rsc_control, rsc_control_new, queue, rsc;
+	bool enable, jumbo, any_enabled = false;
+	struct ethtool_rx_fs_item *item;
+	unsigned long flags;
+	u32 ncfgr;
+
+	enable = (!!(feature & NETIF_F_NTUPLE) && !!(feature & NETIF_F_LRO));
+	rsc = gem_readl(bp, RSC);
+	rsc_control = GEM_BFEXT(RSCCTRL, rsc);
+	rsc_control_new = 0;
+	if (enable) {
+		list_for_each_entry(item, &bp->rx_fs_list.list, list) {
+			queue = item->fs.ring_cookie;
+			rsc_control_new |= (1 << (queue - 1));
+			any_enabled = true;
+			netdev_dbg(bp->dev, "RSC %sabled for queue %u\n",
+				   enable ? "en" : "dis", queue);
+		}
+	}
+	if (rsc_control_new != rsc_control) {
+		rsc = GEM_BFINS(RSCCTRL, rsc_control_new, rsc);
+		gem_writel(bp, RSC, rsc);
+	}
+	if (bp->caps & MACB_CAPS_JUMBO) {
+		/* Don't enable jumbo mode for RSC:
+		 * disable unless not RSC and large MTU
+		 */
+		ncfgr = gem_readl(bp, NCFGR);
+		enable = !any_enabled;
+		jumbo = !!MACB_BFEXT(JFRAME, ncfgr);
+		/* and don't touch if already in the state we want */
+		if ((jumbo && !enable) || (!jumbo && enable)) {
+			ncfgr = MACB_BFINS(JFRAME, enable, ncfgr);
+			spin_lock_irqsave(&bp->lock, flags);
+			gem_writel(bp, NCFGR, ncfgr);
+			spin_unlock_irqrestore(&bp->lock, flags);
+		}
+	}
+	/* Need to enable header-data splitting also */
+	gem_enable_hdr_data_split(bp, any_enabled);
+}
+
 static void gem_enable_flow_filters(struct macb *bp, bool enable)
 {
 	struct ethtool_rx_fs_item *item;
@@ -2969,6 +3027,8 @@ static int gem_add_flow_filter(struct net_device *netdev,
 	if (netdev->features & NETIF_F_NTUPLE)
 		gem_enable_flow_filters(bp, 1);
 
+	/* enable RSC if LRO & NTUPLE on */
+	gem_update_rsc_state(bp, netdev->features);
 	spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
 	return 0;
 
@@ -3009,6 +3069,7 @@ static int gem_del_flow_filter(struct net_device *netdev,
 			return 0;
 		}
 	}
+	gem_update_rsc_state(bp, netdev->features);
 
 	spin_unlock_irqrestore(&bp->rx_fs_lock, flags);
 	return -EINVAL;
@@ -3191,7 +3252,12 @@ static int macb_set_features(struct net_device *netdev,
 		bool turn_on = features & NETIF_F_NTUPLE;
 
 		gem_enable_flow_filters(bp, turn_on);
+		gem_update_rsc_state(bp, features);
 	}
+
+	/* LRO (Large Receive Offload) aka RSC (Receive Side Coalescing) */
+	if ((changed & NETIF_F_LRO) && macb_is_gem(bp))
+		gem_update_rsc_state(bp, features);
 	return 0;
 }
 
@@ -3449,8 +3515,10 @@ static int macb_init(struct platform_device *pdev)
 		dev->hw_features |= MACB_NETIF_LSO;
 
 	/* Check RSC capability */
-	if (GEM_BFEXT(PBUF_RSC, gem_readl(bp, DCFG6)))
+	if (GEM_BFEXT(PBUF_RSC, gem_readl(bp, DCFG6))) {
 		dev->hw_features |= NETIF_F_LRO;
+		gem_writel(bp, RSC, GEM_BIT(CLRMSK));
+	}
 
 	/* Checksum offload is only available on gem with packet buffer */
 	if (macb_is_gem(bp) && !(bp->caps & MACB_CAPS_FIFO_MODE))
-- 
2.4.5

^ permalink raw reply related

* Re: v6/sit tunnels and VRFs
From: Jeff Barnhill @ 2018-04-14 22:07 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev
In-Reply-To: <e19f2fb3-319c-e8ea-5fc3-5072ddb69c5b@gmail.com>

I didn't see an easy way to achieve this behavior without affecting
the non-VRF routing lookups (such as deleting non-VRF rules).  We have
some automated tests that were looking for specific responses, but, of
course, those can be changed.  Among a few of my colleagues, this
became a discussion about maintaining consistent behavior between VRF
and non-VRF, such that a ping or some other tool wouldn't respond
differently.  That's the main reason I asked the question here - to
see how important this was in general use. It sounds like in your
experience, the specific error message/code hasn't been an issue.

Thanks,
Jeff

On Fri, Apr 13, 2018 at 4:31 PM, David Ahern <dsahern@gmail.com> wrote:
> On 4/13/18 2:23 PM, Jeff Barnhill wrote:
>> It seems that the ENETUNREACH response is still desirable in the VRF
>> case since the only difference (when using VRF vs. not) is that the
>> lookup should be restrained to a specific VRF.
>
> VRF is just policy routing to a table. If the table wants the lookup to
> stop, then it needs a default route. What you are referring to is the
> lookup goes through all tables and does not find an answer so it fails
> with -ENETUNREACH. I do not know of any way to make that happen with the
> existing default route options and in the past 2+ years we have not hit
> any s/w that discriminates -ENETUNREACH from -EHOSTUNREACH.
>
> I take it this is code from your internal code base. Why does it care
> between those two failures?

^ permalink raw reply

* Re: Regression with 5dcd8400884c ("macsec: missing dev_put() on error in macsec_newlink()")
From: Sabrina Dubroca @ 2018-04-14 22:31 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Dan Carpenter, David S. Miller, Linux Kernel Mailing List, netdev
In-Reply-To: <9a3a84ff-1fd1-c063-0c50-a297d29a692b@redhat.com>

Hello Laura,

2018-04-14, 10:56:55 -0700, Laura Abbott wrote:
> Hi,
> 
> Fedora got a bug report of a regression when trying to remove the
> the macsec module (https://bugzilla.redhat.com/show_bug.cgi?id=1566410).
> I did a bisect and found
> 
> commit 5dcd8400884cc4a043a6d4617e042489e5d566a9
> Author: Dan Carpenter <dan.carpenter@oracle.com>
> Date:   Wed Mar 21 11:09:01 2018 +0300
> 
>     macsec: missing dev_put() on error in macsec_newlink()
>     We moved the dev_hold(real_dev); call earlier in the function but forgot
>     to update the error paths.
>     Fixes: 0759e552bce7 ("macsec: fix negative refcnt on parent link")
>     Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> The script I used for testing based on the reporter is attached. It
> looks like modprobe is stuck in the D state. Any idea?

I don't think that reference was actually leaked. It gets released in
macsec_free_netdev() when the device is deleted.

modprobe getting stuck is just a side-effect of the refcount going
negative on the parent device, since removing the module needs to take
the lock that is held by device deletion.

I'll send a revert tomorrow.

Thanks for the report,

-- 
Sabrina

^ permalink raw reply

* Re: Cavium Octeon III network driver.
From: Florian Fainelli @ 2018-04-15  0:08 UTC (permalink / raw)
  To: Steven J. Hill, netdev
In-Reply-To: <c269ed89-75ac-895a-984f-badc0b4d9a05@cavium.com>

Hi Steven,

On 04/13/2018 03:43 PM, Steven J. Hill wrote:
> Patches for Cavium's Octeon III network driver were submitted by
> David Daney back on 20180222. David has since left the company and
> I am now responsible for the upstreaming effort. When looking at
> <pachwork.ozlabs.org> they are marked as "Not Applicable". What
> steps do I take next? Thanks.

net-next tree is currently closed, but once it opens back up, you would
likely want to resubmit those patches. Last I remember they were ready
to go.
-- 
Florian

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox