Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2] sh_eth: remove unchecked interrupts
From: Geert Uytterhoeven @ 2016-12-02 12:18 UTC (permalink / raw)
  To: Chris Brandt
  Cc: Sergei Shtylyov, David Miller, Simon Horman, Geert Uytterhoeven,
	netdev@vger.kernel.org, Linux-Renesas
In-Reply-To: <SG2PR06MB1165C7D7B7DFBAC4B5BAB09A8A8F0@SG2PR06MB1165.apcprd06.prod.outlook.com>

Hi Chris,

On Thu, Dec 1, 2016 at 7:53 PM, Chris Brandt <Chris.Brandt@renesas.com> wrote:
> On 12/1/2016, Sergei Shtylyov wrote:
>>
>> On 12/01/2016 05:42 PM, Geert Uytterhoeven wrote:
>>
>> >> --- a/drivers/net/ethernet/renesas/sh_eth.c
>> >> +++ b/drivers/net/ethernet/renesas/sh_eth.c
>> >> @@ -518,7 +518,7 @@ static struct sh_eth_cpu_data r7s72100_data = {
>> >>
>> >>         .ecsr_value     = ECSR_ICD,
>> >>         .ecsipr_value   = ECSIPR_ICDIP,
>> >> -       .eesipr_value   = 0xff7f009f,
>> >> +       .eesipr_value   = 0xe77f009f,
>> >
>> > Comment not directly related to the merits of this patch: the EESIPR
>> > bit definitions seem to be identical to those for EESR, so we can get
>> > rid of the hardcoded values here?
>>
>>     Do you mean using the @define's? We have EESIPR bits also declared,
>> see enum DMAC_IM_BIT,

Yes, that's what I meant.
Unfortunately the DMAC_IM_BIT enum doesn't cover all bits.

> Is your idea to get rid of .eesipr_value for devices that have values
> that are the same as .eesr_err_check?
>
>
> For example in sh_eth_dev_init():
>
>         sh_eth_modify(ndev, EESR, 0, 0);
>         mdp->irq_enabled = true;
> -       sh_eth_write(ndev, mdp->cd->eesipr_value, EESIPR);
> +       if (mdp->cd->eesipr_value)
> +               sh_eth_write(ndev, mdp->cd->eesipr_value, EESIPR);
> +       else
> +               sh_eth_write(ndev, mdp->cd->eesr_err_check, EESIPR);

No, my intention was to just get rid of the hardcoded values when
initializing .eesipr_value.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Pavel Machek @ 2016-12-02 12:32 UTC (permalink / raw)
  To: Giuseppe CAVALLARO; +Cc: alexandre.torgue, David Miller, netdev, linux-kernel
In-Reply-To: <3192a4b6-1e97-048f-a0dd-bfc0f3d96ed8@st.com>

[-- Attachment #1: Type: text/plain, Size: 3728 bytes --]

Hi!

> >Well, if you have a workload that sends and receive packets, it tends
> >to work ok, as you do tx_clean() in stmmac_poll(). My workload is not
> >like that -- it is "sending packets at 3MB/sec, receiving none". So
> >the stmmac_tx_timer() is rescheduled and rescheduled and rescheduled,
> >and then we run out of transmit descriptors, and then 40msec passes,
> >and then we clean them. Bad.
> >
> >And that's why low-res timers do not cut it.
> 
> in that case, I expect that the tuning of the driver could help you.
> I mean, by using ethtool, it could be enough to set the IC bit on all
> the descriptors. You should touch the tx_coal_frames.
> 
> Then you can use ethtool -S to monitor the status.

Yes, I did something similar. Unfortnunately that meant crash within
minutes, at least with 4.4 kernel. (If you know what was fixed between
4.4 and 4.9, that would be helpful).

> We had experimented this tuning on STB IP where just datagrams
> had to send externally. To be honest, although we had seen
> better results w/o any timer, we kept this approach enabled
> because the timer was fast enough to cover our tests on SH4 boxes.

Please reply to David, and explain how it is supposed to
work... because right now it does not. 40 msec delays are not
acceptable in default configuration.

> >>In the ring, some descriptors can raise the irq (according to a
> >>threshold) and set the IC bit. In this path, the NAPI  poll will be
> >>scheduled.
> >
> >Not NAPI poll but stmmac_tx_timer(), right?
> 
> in the xmit according the the threshold the timer is started or the
> interrupt is set inside the descriptor.
> Then stmmac_tx_clean will be always called and, if you see the flow,
> no irqlock protection is needed!

Agreed that no irqlock protection is needed if we rely on napi and timers.

> >>Concerning the lock protection, we had reviewed long time ago and
> >>IIRC, no raise condition should be present. Open to review it,
> >>again!
...
> >There's nothing that protect stmmac_poll() from running concurently
> >with stmmac_dma_interrupt(), right?
> 
> This is not necessary.

dma_interrupt accesses shared priv->xstats; variables are of type
unsigned long (not atomic_t), yet they are accesssed from interrupt
context and from stmmac_ethtool without any locking. That can result
in broken statistics AFAICT.

Please take another look. As far as I can tell, you can have two cpus
at #1 and #2 in the code, at the same time. It looks like napi_... has
some atomic opertions inside so that looks safe at the first look. But
I'm not sure if they also include enough memory barriers to make it
safe...?


static void stmmac_dma_interrupt(struct stmmac_priv *priv)
{
...
        status = priv->hw->dma->dma_interrupt(priv->ioaddr, &priv->xstats);
        if (likely((status & handle_rx)) || (status & handle_tx)) {
                if (likely(napi_schedule_prep(&priv->napi))) {
#1
                        stmmac_disable_dma_irq(priv);
                        __napi_schedule(&priv->napi);
                }
        }


static int stmmac_poll(struct napi_struct *napi, int budget)
{
        struct stmmac_priv *priv = container_of(napi, struct stmmac_priv, napi);
        int work_done = 0;

        priv->xstats.napi_poll++;
        stmmac_tx_clean(priv);

        work_done = stmmac_rx(priv, budget);
	if (work_done < budget) {
                napi_complete(napi);
#2
	        stmmac_enable_dma_irq(priv);
        }
        return work_done;
}


Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* net/can: warning in raw_setsockopt/__alloc_pages_slowpath
From: Andrey Konovalov @ 2016-12-02 12:43 UTC (permalink / raw)
  To: Oliver Hartkopp, Marc Kleine-Budde, David S. Miller, linux-can,
	netdev, LKML
  Cc: Dmitry Vyukov, Kostya Serebryany, syzkaller

[-- Attachment #1: Type: text/plain, Size: 1851 bytes --]

Hi!

I've got the following error report while running the syzkaller fuzzer.

A reproducer is attached.

On commit d8e435f3ab6fea2ea324dce72b51dd7761747523 (Nov 26).

------------[ cut here ]------------
WARNING: CPU: 0 PID: 4009 at mm/page_alloc.c:3511
__alloc_pages_slowpath+0x3d4/0x1bf0
Modules linked in:
CPU: 0 PID: 4009 Comm: a.out Not tainted 4.9.0-rc6+ #54
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffff88006832f8a8 ffffffff81c73b14 0000000000000000 0000000000000000
 ffffffff842c6320 0000000000000000 ffff88006832f8f0 ffffffff8123dc57
 ffff880067d86000 ffffffff00000db7 ffffffff842c6320 0000000000000db7
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff81c73b14>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [<ffffffff8123dc57>] __warn+0x1a7/0x1f0 kernel/panic.c:550
 [<ffffffff8123de6c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
 [<ffffffff81559b04>] __alloc_pages_slowpath+0x3d4/0x1bf0 mm/page_alloc.c:3511
 [<ffffffff8155b8e2>] __alloc_pages_nodemask+0x5c2/0x710 mm/page_alloc.c:3781
 [<ffffffff816236a4>] alloc_pages_current+0xf4/0x400 mm/mempolicy.c:2072
 [<     inline     >] alloc_pages ./include/linux/gfp.h:469
 [<ffffffff815a834f>] kmalloc_order+0x1f/0x70 mm/slab_common.c:1015
 [<ffffffff815a83bf>] kmalloc_order_trace+0x1f/0x160 mm/slab_common.c:1026
 [<     inline     >] kmalloc_large ./include/linux/slab.h:422
 [<ffffffff81635e67>] __kmalloc_track_caller+0x227/0x2a0 mm/slub.c:4233
 [<ffffffff8159932c>] memdup_user+0x2c/0xa0 mm/util.c:137
 [<ffffffff8369e0de>] raw_setsockopt+0x1be/0x9f0 net/can/raw.c:506
 [<     inline     >] SYSC_setsockopt net/socket.c:1757
 [<ffffffff82ca10c4>] SyS_setsockopt+0x154/0x240 net/socket.c:1736
 [<ffffffff840f2901>] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
---[ end trace bc80556cca970089 ]---

[-- Attachment #2: setsockopt-kmalloc-warning-poc.c --]
[-- Type: text/x-csrc, Size: 4559 bytes --]

// autogenerated by syzkaller (http://github.com/google/syzkaller)

#ifndef __NR_setsockopt
#define __NR_setsockopt 54
#endif
#ifndef __NR_socket
#define __NR_socket 41
#endif

#define _GNU_SOURCE

#include <sys/ioctl.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/resource.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/wait.h>

#include <linux/capability.h>
#include <linux/if.h>
#include <linux/if_tun.h>
#include <linux/sched.h>
#include <net/if_arp.h>

#include <assert.h>
#include <dirent.h>
#include <errno.h>
#include <fcntl.h>
#include <grp.h>
#include <pthread.h>
#include <setjmp.h>
#include <signal.h>
#include <stdarg.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

const int kFailStatus = 67;
const int kErrorStatus = 68;
const int kRetryStatus = 69;

__attribute__((noreturn)) void fail(const char* msg, ...)
{
  int e = errno;
  fflush(stdout);
  va_list args;
  va_start(args, msg);
  vfprintf(stderr, msg, args);
  va_end(args);
  fprintf(stderr, " (errno %d)\n", e);
  exit(kFailStatus);
}

__attribute__((noreturn)) void exitf(const char* msg, ...)
{
  int e = errno;
  fflush(stdout);
  va_list args;
  va_start(args, msg);
  vfprintf(stderr, msg, args);
  va_end(args);
  fprintf(stderr, " (errno %d)\n", e);
  exit(kRetryStatus);
}

static int flag_debug;

void debug(const char* msg, ...)
{
  if (!flag_debug)
    return;
  va_list args;
  va_start(args, msg);
  vfprintf(stdout, msg, args);
  va_end(args);
  fflush(stdout);
}

__thread int skip_segv;
__thread jmp_buf segv_env;

static void segv_handler(int sig, siginfo_t* info, void* uctx)
{
  if (__atomic_load_n(&skip_segv, __ATOMIC_RELAXED))
    _longjmp(segv_env, 1);
  exit(sig);
}

static void install_segv_handler()
{
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sa.sa_sigaction = segv_handler;
  sa.sa_flags = SA_NODEFER | SA_SIGINFO;
  sigaction(SIGSEGV, &sa, NULL);
  sigaction(SIGBUS, &sa, NULL);
}

#define NONFAILING(...)                                                \
  {                                                                    \
    __atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST);               \
    if (_setjmp(segv_env) == 0) {                                      \
      __VA_ARGS__;                                                     \
    }                                                                  \
    __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST);               \
  }

static uintptr_t execute_syscall(int nr, uintptr_t a0, uintptr_t a1,
                                 uintptr_t a2, uintptr_t a3,
                                 uintptr_t a4, uintptr_t a5,
                                 uintptr_t a6, uintptr_t a7,
                                 uintptr_t a8)
{
  switch (nr) {
  default:
    return syscall(nr, a0, a1, a2, a3, a4, a5);
  }
}

static void setup_main_process(uint64_t pid)
{
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sa.sa_handler = SIG_IGN;
  syscall(SYS_rt_sigaction, 0x20, &sa, NULL, 8);
  syscall(SYS_rt_sigaction, 0x21, &sa, NULL, 8);
  install_segv_handler();

  char tmpdir_template[] = "./syzkaller.XXXXXX";
  char* tmpdir = mkdtemp(tmpdir_template);
  if (!tmpdir)
    fail("failed to mkdtemp");
  if (chmod(tmpdir, 0777))
    fail("failed to chmod");
  if (chdir(tmpdir))
    fail("failed to chdir");
}

static void loop();

static void sandbox_common()
{
  prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
  setpgrp();
  setsid();

  struct rlimit rlim;
  rlim.rlim_cur = rlim.rlim_max = 128 << 20;
  setrlimit(RLIMIT_AS, &rlim);
  rlim.rlim_cur = rlim.rlim_max = 1 << 20;
  setrlimit(RLIMIT_FSIZE, &rlim);
  rlim.rlim_cur = rlim.rlim_max = 1 << 20;
  setrlimit(RLIMIT_STACK, &rlim);
  rlim.rlim_cur = rlim.rlim_max = 0;
  setrlimit(RLIMIT_CORE, &rlim);

  unshare(CLONE_NEWNS);
  unshare(CLONE_NEWIPC);
  unshare(CLONE_IO);
}

static int do_sandbox_none()
{
  int pid = fork();
  if (pid)
    return pid;
  sandbox_common();
  loop();
  exit(1);
}

long r[2];
void loop()
{
  memset(r, -1, sizeof(r));
  r[0] = execute_syscall(__NR_socket, 0x1dul, 0x3ul, 0x1ul, 0, 0, 0, 0,
                         0, 0);
  r[1] = execute_syscall(__NR_setsockopt, r[0], 0x65ul, 0x1ul,
                         0x20000000ul, 0x18000000ul, 0, 0, 0, 0);
}
int main()
{
  setup_main_process(0);
  int pid = do_sandbox_none();
  int status = 0;
  while (waitpid(pid, &status, __WALL) != pid) {
  }
  return 0;
}



^ permalink raw reply

* arp_filter and IPv6 ND
From: Saku Ytti @ 2016-12-02 12:51 UTC (permalink / raw)
  To: netdev

Hey,

net.ipv4.conf.all.arp_filter appears not to have IPv6 counter part.
Or am I missing something? That is Linux does answer to ND queries for
unrelated interfaces by default, and I can't seem to find way to turn
that off.

Is it proper maintainership to accept changes to single protocol,
without mandating the support for other protocol having same
behavioural characteristics?

It is good that some parts for ARP and ND have common code in linux
(neighbour.c) unlike in BSD where everything seems to be
self-contained.
I'd wish that even more of ARP/ND would common, because there are
still lot of common behavioural code in ARP/ND code itself, which
requires double maintenance and are implemented by different people at
different times, so leads to different set of bugs and behaviour for
same intended behaviour.

For example this feature should be protocol agnostic, developer should
only need to develop it once for the higher level behavioural code,
without minding which IP AFI it is for. Obviously that does not
exclude ability to sysctl configure it on/off per AFI.

Thanks!
-- 
  ++ytti

^ permalink raw reply

* Re: [PATCH net v2] igb: re-assign hw address pointer on reset after PCI error
From: Guilherme G. Piccoli @ 2016-12-02 12:55 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher, alexander.duyck; +Cc: netdev, todd.fujinaka
In-Reply-To: <1478803603-30306-1-git-send-email-gpiccoli@linux.vnet.ibm.com>

On 11/10/2016 04:46 PM, Guilherme G. Piccoli wrote:
> Whenever the igb driver detects the result of a read operation returns
> a value composed only by F's (like 0xFFFFFFFF), it will detach the
> net_device, clear the hw_addr pointer and warn to the user that adapter's
> link is lost - those steps happen on igb_rd32().
> 
> In case a PCI error happens on Power architecture, there's a recovery
> mechanism called EEH, that will reset the PCI slot and call driver's
> handlers to reset the adapter and network functionality as well.
> 
> We observed that once hw_addr is NULL after the error is detected on
> igb_rd32(), it's never assigned back, so in the process of resetting
> the network functionality we got a NULL pointer dereference in both
> igb_configure_tx_ring() and igb_configure_rx_ring(). In order to avoid
> such bug, this patch re-assigns the hw_addr value in the slot_reset
> handler.
> 
> Reported-by: Anthony H. Thai <ahthai@us.ibm.com>
> Reported-by: Harsha Thyagaraja <hathyaga@in.ibm.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index edc9a6a..136ee9e 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -7878,6 +7878,11 @@ static pci_ers_result_t igb_io_slot_reset(struct pci_dev *pdev)
>  		pci_enable_wake(pdev, PCI_D3hot, 0);
>  		pci_enable_wake(pdev, PCI_D3cold, 0);
> 
> +		/* In case of PCI error, adapter lose its HW address
> +		 * so we should re-assign it here.
> +		 */
> +		hw->hw_addr = adapter->io_addr;
> +
>  		igb_reset(adapter);
>  		wr32(E1000_WUS, ~0);
>  		result = PCI_ERS_RESULT_RECOVERED;
> 

Ping?

Sorry to annoy, any news on this?
Thanks in advance!

Cheers,


Guilherme

^ permalink raw reply

* Re: Initial thoughts on TXDP
From: Jesper Dangaard Brouer @ 2016-12-02 13:01 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: brouer, Tom Herbert, Florian Westphal,
	Linux Kernel Network Developers, Alexander Duyck, John Fastabend,
	linux-mm
In-Reply-To: <859a0c99-f427-1db8-d260-1297777792fb@stressinduktion.org>

On Thu, 1 Dec 2016 23:47:44 +0100
Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:

> Side note:
> 
> On 01.12.2016 20:51, Tom Herbert wrote:
> >> > E.g. "mini-skb": Even if we assume that this provides a speedup
> >> > (where does that come from? should make no difference if a 32 or
> >> >  320 byte buffer gets allocated).

Yes, the size of the allocation from the SLUB allocator does not change
base performance/cost much (at least for small objects, if < 1024).

Do notice the base SLUB alloc+free cost is fairly high (compared to a
201 cycles budget). Especially for networking as the free-side is very
likely to hit a slow path.  SLUB fast-path 53 cycles, and slow-path
around 100 cycles (data from [1]).  I've tried to address this with the
kmem_cache bulk APIs.  Which reduce the cost to approx 30 cycles.
(Something we have not fully reaped the benefit from yet!)

[1] https://git.kernel.org/torvalds/c/ca257195511

> >> >  
> > It's the zero'ing of three cache lines. I believe we talked about that
> > as netdev.

Actually 4 cache-lines, but with some cleanup I believe we can get down
to clearing 192 bytes 3 cache-lines.

> 
> Jesper and me played with that again very recently:
> 
> https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_memset.c#L590
> 
> In micro-benchmarks we saw a pretty good speed up not using the rep
> stosb generated by gcc builtin but plain movq's. Probably the cost model
> for __builtin_memset in gcc is wrong?

Yes, I believe so.

> When Jesper is free we wanted to benchmark this and maybe come up with a
> arch specific way of cleaning if it turns out to really improve throughput.
> 
> SIMD instructions seem even faster but the kernel_fpu_begin/end() kill
> all the benefits.

One strange thing was, that on my skylake CPU (i7-6700K @4.00GHz),
Hannes's hand-optimized MOVQ ASM-code didn't go past 8 bytes per cycle,
or 32 cycles for 256 bytes.

Talking to Alex and John during netdev, and reading on the Intel arch,
I though that this CPU should be-able-to perform 16 bytes per cycle.
The CPU can do it as the rep-stos show this once the size gets large
enough.

On this CPU the memset rep stos starts to win around 512 bytes:

 192/35 =  5.5 bytes/cycle
 256/36 =  7.1 bytes/cycle
 512/40 = 12.8 bytes/cycle
 768/46 = 16.7 bytes/cycle
1024/52 = 19.7 bytes/cycle
2048/84 = 24.4 bytes/cycle
4096/148= 27.7 bytes/cycle

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: net/can: warning in raw_setsockopt/__alloc_pages_slowpath
From: Marc Kleine-Budde @ 2016-12-02 13:24 UTC (permalink / raw)
  To: Andrey Konovalov, Oliver Hartkopp, David S. Miller, linux-can,
	netdev, LKML
  Cc: Dmitry Vyukov, Kostya Serebryany, syzkaller
In-Reply-To: <CAAeHK+x3vqjzzXPtp-xCshWnvTcVrZqGmJR7rbooRFxBJanv8g@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 3166 bytes --]

On 12/02/2016 01:43 PM, Andrey Konovalov wrote:
> Hi!
> 
> I've got the following error report while running the syzkaller fuzzer.
> 
> A reproducer is attached.
> 
> On commit d8e435f3ab6fea2ea324dce72b51dd7761747523 (Nov 26).
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 4009 at mm/page_alloc.c:3511
> __alloc_pages_slowpath+0x3d4/0x1bf0
> Modules linked in:
> CPU: 0 PID: 4009 Comm: a.out Not tainted 4.9.0-rc6+ #54
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  ffff88006832f8a8 ffffffff81c73b14 0000000000000000 0000000000000000
>  ffffffff842c6320 0000000000000000 ffff88006832f8f0 ffffffff8123dc57
>  ffff880067d86000 ffffffff00000db7 ffffffff842c6320 0000000000000db7
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff81c73b14>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
>  [<ffffffff8123dc57>] __warn+0x1a7/0x1f0 kernel/panic.c:550
>  [<ffffffff8123de6c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
>  [<ffffffff81559b04>] __alloc_pages_slowpath+0x3d4/0x1bf0 mm/page_alloc.c:3511
>  [<ffffffff8155b8e2>] __alloc_pages_nodemask+0x5c2/0x710 mm/page_alloc.c:3781
>  [<ffffffff816236a4>] alloc_pages_current+0xf4/0x400 mm/mempolicy.c:2072
>  [<     inline     >] alloc_pages ./include/linux/gfp.h:469
>  [<ffffffff815a834f>] kmalloc_order+0x1f/0x70 mm/slab_common.c:1015
>  [<ffffffff815a83bf>] kmalloc_order_trace+0x1f/0x160 mm/slab_common.c:1026
>  [<     inline     >] kmalloc_large ./include/linux/slab.h:422
>  [<ffffffff81635e67>] __kmalloc_track_caller+0x227/0x2a0 mm/slub.c:4233
>  [<ffffffff8159932c>] memdup_user+0x2c/0xa0 mm/util.c:137
>  [<ffffffff8369e0de>] raw_setsockopt+0x1be/0x9f0 net/can/raw.c:506

We should add a check for a sensible optlen....

> static int raw_setsockopt(struct socket *sock, int level, int optname,
> 			  char __user *optval, unsigned int optlen)
> {
> 	struct sock *sk = sock->sk;
> 	struct raw_sock *ro = raw_sk(sk);
> 	struct can_filter *filter = NULL;  /* dyn. alloc'ed filters */
> 	struct can_filter sfilter;         /* single filter */
> 	struct net_device *dev = NULL;
> 	can_err_mask_t err_mask = 0;
> 	int count = 0;
> 	int err = 0;
> 
> 	if (level != SOL_CAN_RAW)
> 		return -EINVAL;
> 
> 	switch (optname) {
> 
> 	case CAN_RAW_FILTER:
> 		if (optlen % sizeof(struct can_filter) != 0)
> 			return -EINVAL;

here...

		if (optlen > 64 * sizeof(struct can_filter))
			return -EINVAL;

> 
> 		count = optlen / sizeof(struct can_filter);
> 
> 		if (count > 1) {
> 			/* filter does not fit into dfilter => alloc space */
> 			filter = memdup_user(optval, optlen);
> 			if (IS_ERR(filter))
> 				return PTR_ERR(filter);
> 		} else if (count == 1) {
> 			if (copy_from_user(&sfilter, optval, sizeof(sfilter)))
> 				return -EFAULT;
> 		}
> 
> 		lock_sock(sk);

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Giuseppe CAVALLARO @ 2016-12-02 13:51 UTC (permalink / raw)
  To: Pavel Machek
  Cc: alexandre.torgue, David Miller, netdev, linux-kernel,
	Alexandre TORGUE
In-Reply-To: <20161202123241.GA5869@amd>

On 12/2/2016 1:32 PM, Pavel Machek wrote:
> Hi!
>
>>> Well, if you have a workload that sends and receive packets, it tends
>>> to work ok, as you do tx_clean() in stmmac_poll(). My workload is not
>>> like that -- it is "sending packets at 3MB/sec, receiving none". So
>>> the stmmac_tx_timer() is rescheduled and rescheduled and rescheduled,
>>> and then we run out of transmit descriptors, and then 40msec passes,
>>> and then we clean them. Bad.
>>>
>>> And that's why low-res timers do not cut it.
>>
>> in that case, I expect that the tuning of the driver could help you.
>> I mean, by using ethtool, it could be enough to set the IC bit on all
>> the descriptors. You should touch the tx_coal_frames.
>>
>> Then you can use ethtool -S to monitor the status.
>
> Yes, I did something similar. Unfortnunately that meant crash within
> minutes, at least with 4.4 kernel. (If you know what was fixed between
> 4.4 and 4.9, that would be helpful).

4.4 has no GMAC4 support.
Alex, do you remember any patches to fix that?

>> We had experimented this tuning on STB IP where just datagrams
>> had to send externally. To be honest, although we had seen
>> better results w/o any timer, we kept this approach enabled
>> because the timer was fast enough to cover our tests on SH4 boxes.
>
> Please reply to David, and explain how it is supposed to
> work... because right now it does not. 40 msec delays are not
> acceptable in default configuration.

I mean, that on UP and SMP system this schema helped
to improve the performance saving CPU on my side and this has been
tested since a long time (~4 years).
I tested something similar to yours where unidirectional traffic
with limited throughput was needed and I can confirm you that
tuning/removing coalesce parameters this helped. The tuning I decided
to keep in the driver was suitable in several user cases and if now
you have problems or you want to review it I can just confirm that
there are no problems on my side. If you want to simply the logic
around the tx process and remove timer on official driver I can accept
that. I will just ask you uto double check if the throughput and
CPU usage when request max throughput (better if on GiGa setup) has
no regressions.
Otherwise we could start thinking about adaptive schema if feasible.

>>>> In the ring, some descriptors can raise the irq (according to a
>>>> threshold) and set the IC bit. In this path, the NAPI  poll will be
>>>> scheduled.
>>>
>>> Not NAPI poll but stmmac_tx_timer(), right?
>>
>> in the xmit according the the threshold the timer is started or the
>> interrupt is set inside the descriptor.
>> Then stmmac_tx_clean will be always called and, if you see the flow,
>> no irqlock protection is needed!
>
> Agreed that no irqlock protection is needed if we rely on napi and timers.

ok

>
>>>> Concerning the lock protection, we had reviewed long time ago and
>>>> IIRC, no raise condition should be present. Open to review it,
>>>> again!
> ...
>>> There's nothing that protect stmmac_poll() from running concurently
>>> with stmmac_dma_interrupt(), right?
>>
>> This is not necessary.
>
> dma_interrupt accesses shared priv->xstats; variables are of type
> unsigned long (not atomic_t), yet they are accesssed from interrupt
> context and from stmmac_ethtool without any locking. That can result
> in broken statistics AFAICT.

ok we can check this and welcome patches and I'd prefer to
remove xstats from critical part of the code like ISR (that
comes from old story of the driver).

>
> Please take another look. As far as I can tell, you can have two cpus
> at #1 and #2 in the code, at the same time. It looks like napi_... has
> some atomic opertions inside so that looks safe at the first look. But
> I'm not sure if they also include enough memory barriers to make it
> safe...?

Although I have never reproduced related issues on SMP platforms
due to reordering of memory operations but, as said above, welcome
review on this especially if you are seeing problems when manage NAPI.

FYI, the only memory barrier you will see in the driver are about the
OWN_BIT setting till now.

> static void stmmac_dma_interrupt(struct stmmac_priv *priv)
> {
> ...
>         status = priv->hw->dma->dma_interrupt(priv->ioaddr, &priv->xstats);
>         if (likely((status & handle_rx)) || (status & handle_tx)) {
>                 if (likely(napi_schedule_prep(&priv->napi))) {
> #1
>                         stmmac_disable_dma_irq(priv);
>                         __napi_schedule(&priv->napi);
>                 }
>         }
>
>
> static int stmmac_poll(struct napi_struct *napi, int budget)
> {
>         struct stmmac_priv *priv = container_of(napi, struct stmmac_priv, napi);
>         int work_done = 0;
>
>         priv->xstats.napi_poll++;
>         stmmac_tx_clean(priv);
>
>         work_done = stmmac_rx(priv, budget);
> 	if (work_done < budget) {
>                 napi_complete(napi);
> #2
> 	        stmmac_enable_dma_irq(priv);
>         }

hmm, I have to check (and refresh my memory) but the driver
uses the napi_schedule_prep.

Regards

Peppe

>         return work_done;
> }
>
>
> Best regards,
> 									Pavel
>

^ permalink raw reply

* Re: [PATCH net] fib_trie: Avoid expensive update of suffix length when not required
From: Robert Shearman @ 2016-12-02 13:54 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: David Miller, Netdev, Alexander Duyck
In-Reply-To: <CAKgT0Uep40bYFQtToVBdmez5sPoyzapXNTUU10m9x6pyTj=Etg@mail.gmail.com>

On 01/12/16 18:29, Alexander Duyck wrote:
> On Wed, Nov 30, 2016 at 4:27 PM, Robert Shearman <rshearma@brocade.com> wrote:
>> On 29/11/16 23:14, Alexander Duyck wrote:
>>> On Tue, Nov 29, 2016 at 9:46 AM, Robert Shearman <rshearma@brocade.com>
>>>>
>>>> Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length")
>>>> Signed-off-by: Robert Shearman <rshearma@brocade.com>
>>>
>>>
>>> So I am not entirely convinced this is a regression, I was wondering
>>> what your numbers looked like before the patch you are saying this
>>> fixes?  I recall I fixed a number of issues leading up to this so I am
>>> just curious.
>>
>>
>> At commit 21d1f11db0e2f20a549ad8322879507850544670 ("fib_trie: Remove checks
>> for index >= tnode_child_length from tnode_get_child") which is the parent
>> of 5405afd1a306:
>>
>> $ time sudo ip route restore < ~/allroutes
>> RTNETLINK answers: File exists
>> RTNETLINK answers: File exists
>> RTNETLINK answers: File exists
>> RTNETLINK answers: File exists
>>
>> real    0m3.451s
>> user    0m0.184s
>> sys     0m2.004s
>>
>>
>> At commit 5405afd1a30620de8601436bae739c67c0bc7324 ("fib_trie: Add tracking
>> value for suffix length"):
>>
>> $ time sudo ip route restore < ~/allroutes
>> RTNETLINK answers: File exists
>> RTNETLINK answers: File exists
>> RTNETLINK answers: File exists
>> RTNETLINK answers: File exists
>>
>> real    0m48.624s
>> user    0m0.360s
>> sys     0m46.960s
>>
>> So does that warrant a fixes tag for this patch?
>
> Yes, please include it in the patch description next time.
>
> I've been giving it thought and the fact is the patch series has
> merit.  I just think it was being a bit too aggressive in terms of
> some of the changes.  So placing any changes in put_child seem to be
> the one piece in this that make this a bit ugly.

Does that imply the current updating of the parent's slen value should 
be removed from put_child then?

I started out without doing the changes in put_child, but that meant the 
changes to push the slen up through the trie were spread all over the 
place. This seemed like the cleaner solution.

>>>> +       }
>>>> +}
>>>> +
>>>>  /* Add a child at position i overwriting the old value.
>>>>   * Update the value of full_children and empty_children.
>>>> + *
>>>> + * The suffix length of the parent node and the rest of the tree is
>>>> + * updated (if required) when adding/replacing a node, but is caller's
>>>> + * responsibility on removal.
>>>>   */
>>>>  static void put_child(struct key_vector *tn, unsigned long i,
>>>>                       struct key_vector *n)
>>>> @@ -447,8 +461,8 @@ static void put_child(struct key_vector *tn, unsigned
>>>> long i,
>>>>         else if (!wasfull && isfull)
>>>>                 tn_info(tn)->full_children++;
>>>>
>>>> -       if (n && (tn->slen < n->slen))
>>>> -               tn->slen = n->slen;
>>>> +       if (n)
>>>> +               node_push_suffix(tn, n);
>>>
>>>
>>> This is just creating redundant work if we have to perform a resize.
>>
>>
>> The only real redundant work is the assignment of slen in tn, since the
>> propagation up the trie has to happen regardless and if a subsequent resize
>> results in changes to the trie then the propagation will stop at tn's parent
>> since the suffix length of the tn's parent will not be less than tn's suffix
>> length.
>
>
> This isn't going to work.  We want to avoid trying to push the suffix
> when we place a child.  There are scenarios where we are placing
> children in nodes that don't have parents yet because we are doing
> rebalances and such.  I suspect this could be pretty expensive on a
> resize call.

It's not expensive because the new parents that are being built up are 
orphaned from the rest of the trie, so the push up won't proceed into 
the rest of the trie until the new node is inserted into the trie.

> We want to pull the suffix pushing out of the resize completely.  The
> problem is this is going to cause us to start pushing suffixes for
> every node moved as a part of resize.

This will mean that nodes will have to be visited multiple times, which 
could well be more expensive than doing it during the resize.

>
>>>
>>>>         rcu_assign_pointer(tn->tnode[i], n);
>>>>  }
>>
>> ...
>>
>>>> -static void leaf_pull_suffix(struct key_vector *tp, struct key_vector
>>>> *l)
>>>> +static void node_set_suffix(struct key_vector *tp, unsigned char slen)
>>>>  {
>>>> -       while ((tp->slen > tp->pos) && (tp->slen > l->slen)) {
>>>> -               if (update_suffix(tp) > l->slen)
>>>> +       if (slen > tp->slen)
>>>> +               tp->slen = slen;
>>>> +}
>>>> +
>>>> +static void node_pull_suffix(struct key_vector *tn)
>>>> +{
>>>> +       struct key_vector *tp;
>>>> +       unsigned char slen;
>>>> +
>>>> +       slen = update_suffix(tn);
>>>> +       tp = node_parent(tn);
>>>> +       while ((tp->slen > tp->pos) && (tp->slen > slen)) {
>>>> +               if (update_suffix(tp) > slen)
>>>>                         break;
>>>>                 tp = node_parent(tp);
>>>>         }
>>>>  }
>>>>
>>>
>
> Actually I realized that there is a bug here.  The check for
> update_suffix(tp) > slen can cause us to stop prematurely if I am not
> mistaken.  What we should probably be doing is pulling out tp->slen
> and if the result of update_suffix is the same value then we can break
> the loop.  Otherwise we can stop early if a "second runner up" shows
> up that is supposed to be pushed up the trie.

Excellent point. This also a problem in the existing code when removing 
a fib alias with other aliases still on the leaf. I'll send a separate 
patch for this and base my changes on top of it.

> I actually started around with patches to do much the same as what you
> are doing and I will probably submit them as an RFC so if you need you
> can pick pieces out as needed, or I could submit them if they work for
> you.

I'd be happy to test out any patches you send me.

>>> I really hate all the renaming.  Can't you just leave these as
>>> leaf_pull_suffix and leaf_push _suffix.  I'm fine with us dropping the
>>> leaf of the suffix but the renaming just makes adds a bunch of noise.
>>> Really it might work better if you broke the replacement of the
>>> key_vector as a leaf with the suffix length into a separate patch,
>>> then you could do the rename as a part of that.
>>
>>
>> Ok, I'll leave the naming of leaf_push_suffix alone. Note that
>> leaf_pull_suffix hasn't been renamed, the below in the diff is just an
>> artifact of how diff decided to present the changes along with the moving of
>> leaf_push_suffix.
>
> So I have been looking this over for the last couple days and actually
> I have started to be okay with the renaming.
>
> However I would ask to be consistent.  If you are going to have
> node_pull_suffix and node_push_suffix then just pass a node and a
> suffix length.  You can drop the leaf key vector from both functions
> instead of just one.

Ok, I can do that.

>
>>>
>>>> -static void leaf_push_suffix(struct key_vector *tn, struct key_vector
>>>> *l)
>>>> +static void leaf_pull_suffix(struct key_vector *tp, struct key_vector
>>>> *l)
>>>>  {
>>>> -       /* if this is a new leaf then tn will be NULL and we can sort
>>>> -        * out parent suffix lengths as a part of trie_rebalance
>>>> -        */
>>>> -       while (tn->slen < l->slen) {
>>>> -               tn->slen = l->slen;
>>>> -               tn = node_parent(tn);
>>>> +       while ((tp->slen > tp->pos) && (tp->slen > l->slen)) {
>>>> +               if (update_suffix(tp) > l->slen)
>>>> +                       break;
>>>> +               tp = node_parent(tp);
>>>>         }
>>>>  }
>>>
>>>
>>> If possible it would work better if you could avoid moving functions
>>> around as a result of your changes.
>>
>>
>> Ok, I can add a forward declaration instead.
>
> It shouldn't be necessary.  I think you are doing way too much with
> this patch.  We just need this to be a fix, you are doing a bit too
> much like adding this new node_set_suffix function which isn't really
> needed.

As per your comment below the node_set_suffix function isn't necessary. 
However, I'm not sure exactly what you're suggesting with this patch 
doing too much. Are you saying that you'd like to see a series of 
patches starting with some of the restructuring/renaming of the 
push/pull functions, or are you saying that the sum total is too much?

>>>>
>>>> @@ -1783,6 +1801,16 @@ void fib_table_flush_external(struct fib_table
>>>> *tb)
>>>>                         if (IS_TRIE(pn))
>>>>                                 break;
>>>>
>>>> +                       /* push the suffix length to the parent node,
>>>> +                        * since a previous leaf removal may have
>>>> +                        * caused it to change
>>>> +                        */
>>>> +                       if (pn->slen > pn->pos) {
>>>> +                               unsigned char slen = update_suffix(pn);
>>>> +
>>>> +                               node_set_suffix(node_parent(pn), slen);
>>>> +                       }
>>>> +
>>>
>>>
>>> Why bother with the local variable?  You could just pass
>>> update_suffix(pn) as the parameter to your node_set_suffix function.
>>
>>
>> To avoid a long line on the node_set_suffix call or splitting it across
>> lines, but I'll remove the local variable as you suggest and the same below.
>
> Actually I think there is a bug here.  You shouldn't be setting the
> suffix for the parent based on the child.  You can just call
> update_suffix(pn) and that should be enough.

Yes, you're right. BTW, this logic is transferred from the existing 
resize function and I think what saves us both before and after my 
changes is that the logic is repeated for the parent node which fixes 
the value and so on all the way up the trie.

Thanks,
Rob

^ permalink raw reply

* Aw: Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Lino Sanfilippo @ 2016-12-02 14:05 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Giuseppe CAVALLARO, alexandre.torgue, David Miller, netdev,
	linux-kernel
In-Reply-To: <20161202084511.GA32294@amd>

Hi,

> 
> There's nothing that protect stmmac_poll() from running concurently
> with stmmac_dma_interrupt(), right?
> 

could it be that there is also another issue concerned locking?:
The tx completion handler takes the xmit_lock in case that the
netif_queue is stopped. This is AFAICS unnecessary, since both
xmit and completion handler are already synchronized by the private
tx lock. But it is IMHO also dangerous:

In the xmit handler we have the locking order
1. xmit_lock
2. private tx lock

while in the completion handler its the reverse:

1. private tx lock
2. xmit lock.

Do I miss something?

Regards,
Lino

^ permalink raw reply

* Re: [PATCH] stmmac: reduce code duplication getting basic descriptors
From: Alexandre Torgue @ 2016-12-02 14:09 UTC (permalink / raw)
  To: Pavel Machek, David Miller, Andrew Morton
  Cc: peppe.cavallaro, netdev, linux-kernel
In-Reply-To: <20161128121736.GD15034@amd>

Hi Pavel,

On 11/28/2016 01:17 PM, Pavel Machek wrote:
>
> Remove code duplication getting basic descriptors.

I agree with your patch, it will make code easier to understand.
After fix kbuild issue you can add my Acked-by;

Regards
Alex

>
> Signed-off-by: Pavel Machek <pavel@denx.de>
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index f7133d0..ed20668 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -929,6 +929,17 @@ static int stmmac_set_bfsize(int mtu, int bufsize)
>  	return ret;
>  }
>
> +static inline struct dma_desc *stmmac_tx_desc(struct stmmac_priv *priv, int i)
> +{
> +	struct dma_desc *p;
> +	if (priv->extend_desc)
> +		p = &((priv->dma_etx + i)->basic);
> +	else
> +		p = priv->dma_tx + i;
> +	return p;
> +}
> +
> +
>  /**
>   * stmmac_clear_descriptors - clear descriptors
>   * @priv: driver private structure
> @@ -1078,11 +1089,7 @@ static int init_dma_desc_rings(struct net_device *dev, gfp_t flags)
>
>  	/* TX INITIALIZATION */
>  	for (i = 0; i < DMA_TX_SIZE; i++) {
> -		struct dma_desc *p;
> -		if (priv->extend_desc)
> -			p = &((priv->dma_etx + i)->basic);
> -		else
> -			p = priv->dma_tx + i;
> +		struct dma_desc *p = stmmac_tx_desc(priv, i);
>
>  		if (priv->synopsys_id >= DWMAC_CORE_4_00) {
>  			p->des0 = 0;
> @@ -1129,12 +1136,7 @@ static void dma_free_tx_skbufs(struct stmmac_priv *priv)
>  	int i;
>
>  	for (i = 0; i < DMA_TX_SIZE; i++) {
> -		struct dma_desc *p;
> -
> -		if (priv->extend_desc)
> -			p = &((priv->dma_etx + i)->basic);
> -		else
> -			p = priv->dma_tx + i;
> +		struct dma_desc *p = stmmac_tx_desc(priv, i);
>
>  		if (priv->tx_skbuff_dma[i].buf) {
>  			if (priv->tx_skbuff_dma[i].map_as_page)
> @@ -1314,14 +1316,9 @@ static void __stmmac_tx_clean(struct stmmac_priv *priv)
>
>  	while (entry != priv->cur_tx) {
>  		struct sk_buff *skb = priv->tx_skbuff[entry];
> -		struct dma_desc *p;
> +		struct dma_desc *p = stmmac_tx_desc(priv, entry);
>  		int status;
>
> -		if (priv->extend_desc)
> -			p = (struct dma_desc *)(priv->dma_etx + entry);
> -		else
> -			p = priv->dma_tx + entry;
> -
>  		status = priv->hw->desc->tx_status(&priv->dev->stats,
>  						      &priv->xstats, p,
>  						      priv->ioaddr);
> @@ -2227,11 +2224,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>
>  	csum_insertion = (skb->ip_summed == CHECKSUM_PARTIAL);
>
> -	if (likely(priv->extend_desc))
> -		desc = (struct dma_desc *)(priv->dma_etx + entry);
> -	else
> -		desc = priv->dma_tx + entry;
> -
> +	desc = stmmac_tx_desc(priv, entry);
>  	first = desc;
>
>  	priv->tx_skbuff[first_entry] = skb;
> @@ -2254,10 +2247,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>
>  		entry = STMMAC_GET_ENTRY(entry, DMA_TX_SIZE);
>
> -		if (likely(priv->extend_desc))
> -			desc = (struct dma_desc *)(priv->dma_etx + entry);
> -		else
> -			desc = priv->dma_tx + entry;
> +		desc = stmmac_tx_desc(priv, entry);
>
>  		des = skb_frag_dma_map(priv->device, frag, 0, len,
>  				       DMA_TO_DEVICE);
>

^ permalink raw reply

* Re: arp_filter and IPv6 ND
From: Hannes Frederic Sowa @ 2016-12-02 14:08 UTC (permalink / raw)
  To: Saku Ytti, netdev
In-Reply-To: <CAAeewD82Lfo7c=vLHR7p3oxJ8d0OgWrijKmNn8EPFKq7fN7AMg@mail.gmail.com>

On 02.12.2016 13:51, Saku Ytti wrote:
> net.ipv4.conf.all.arp_filter appears not to have IPv6 counter part.
> Or am I missing something? That is Linux does answer to ND queries for
> unrelated interfaces by default, and I can't seem to find way to turn
> that off.

May I ask why you want to turn it off?

In IPv6 this depends on the scope. In IPv4 this concept doesn't really
exist.

Please notice that in IPv4 arp_filter does not necessarily mean that the
system is operating in strong end system mode but you end up in an
hybrid clone where arp is acting strong but routing not and thus you
also have to add fib rules to simulate that.

> Is it proper maintainership to accept changes to single protocol,
> without mandating the support for other protocol having same
> behavioural characteristics?
> 
> It is good that some parts for ARP and ND have common code in linux
> (neighbour.c) unlike in BSD where everything seems to be
> self-contained.
>
> I'd wish that even more of ARP/ND would common, because there are
> still lot of common behavioural code in ARP/ND code itself, which
> requires double maintenance and are implemented by different people at
> different times, so leads to different set of bugs and behaviour for
> same intended behaviour.
> 
> For example this feature should be protocol agnostic, developer should
> only need to develop it once for the higher level behavioural code,
> without minding which IP AFI it is for. Obviously that does not
> exclude ability to sysctl configure it on/off per AFI.

^ permalink raw reply

* [PATCH 1/6] net: stmmac: return error if no DMA configuration is found
From: Niklas Cassel @ 2016-12-02 14:11 UTC (permalink / raw)
  To: Giuseppe Cavallaro, Alexandre Torgue; +Cc: Niklas Cassel, netdev, linux-kernel

From: Niklas Cassel <niklas.cassel@axis.com>

All drivers except pci glue layer calls stmmac_probe_config_dt.
stmmac_probe_config_dt does a kzalloc dma_cfg.

pci glue layer does kzalloc dma_cfg explicitly, so all current
drivers does a kzalloc dma_cfg.

Return an error if no DMA configuration is found, that way
we can assume that the DMA configuration always exists.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1f9ec02fa7f8..04f88df7da49 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1577,16 +1577,12 @@ static void stmmac_check_ether_addr(struct stmmac_priv *priv)
  */
 static int stmmac_init_dma_engine(struct stmmac_priv *priv)
 {
-	int pbl = DEFAULT_DMA_PBL, fixed_burst = 0, aal = 0;
-	int mixed_burst = 0;
 	int atds = 0;
 	int ret = 0;
 
-	if (priv->plat->dma_cfg) {
-		pbl = priv->plat->dma_cfg->pbl;
-		fixed_burst = priv->plat->dma_cfg->fixed_burst;
-		mixed_burst = priv->plat->dma_cfg->mixed_burst;
-		aal = priv->plat->dma_cfg->aal;
+	if (!priv->plat->dma_cfg) {
+		dev_err(priv->device, "DMA configuration not found\n");
+		return -EINVAL;
 	}
 
 	if (priv->extend_desc && (priv->mode == STMMAC_RING_MODE))
@@ -1598,8 +1594,12 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv)
 		return ret;
 	}
 
-	priv->hw->dma->init(priv->ioaddr, pbl, fixed_burst, mixed_burst,
-			    aal, priv->dma_tx_phy, priv->dma_rx_phy, atds);
+	priv->hw->dma->init(priv->ioaddr,
+			    priv->plat->dma_cfg->pbl,
+			    priv->plat->dma_cfg->fixed_burst,
+			    priv->plat->dma_cfg->mixed_burst,
+			    priv->plat->dma_cfg->aal,
+			    priv->dma_tx_phy, priv->dma_rx_phy, atds);
 
 	if (priv->synopsys_id >= DWMAC_CORE_4_00) {
 		priv->rx_tail_addr = priv->dma_rx_phy +
-- 
2.1.4

^ permalink raw reply related

* [PATCH 2/6] net: stmmac: simplify the common DMA init API
From: Niklas Cassel @ 2016-12-02 14:11 UTC (permalink / raw)
  To: Giuseppe Cavallaro, Alexandre Torgue; +Cc: Niklas Cassel, netdev, linux-kernel
In-Reply-To: <1480687910-9690-1-git-send-email-niklass@axis.com>

From: Niklas Cassel <niklas.cassel@axis.com>

Use struct stmmac_dma_cfg *dma_cfg as an argument rather
than using all the struct members as individual arguments.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
---
 drivers/net/ethernet/stmicro/stmmac/common.h        |  4 ++--
 drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c | 13 +++++++------
 drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c  |  7 ++++---
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c    | 14 ++++++++------
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c   |  6 +-----
 5 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 6d2de4e01f6d..3023ec7ae83e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -411,8 +411,8 @@ extern const struct stmmac_desc_ops ndesc_ops;
 struct stmmac_dma_ops {
 	/* DMA core initialization */
 	int (*reset)(void __iomem *ioaddr);
-	void (*init)(void __iomem *ioaddr, int pbl, int fb, int mb,
-		     int aal, u32 dma_tx, u32 dma_rx, int atds);
+	void (*init)(void __iomem *ioaddr, struct stmmac_dma_cfg *dma_cfg,
+		     u32 dma_tx, u32 dma_rx, int atds);
 	/* Configure the AXI Bus Mode Register */
 	void (*axi)(void __iomem *ioaddr, struct stmmac_axi *axi);
 	/* Dump DMA registers */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
index 990746955216..01d0d0f315e5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
@@ -82,8 +82,9 @@ static void dwmac1000_dma_axi(void __iomem *ioaddr, struct stmmac_axi *axi)
 	writel(value, ioaddr + DMA_AXI_BUS_MODE);
 }
 
-static void dwmac1000_dma_init(void __iomem *ioaddr, int pbl, int fb, int mb,
-			       int aal, u32 dma_tx, u32 dma_rx, int atds)
+static void dwmac1000_dma_init(void __iomem *ioaddr,
+			       struct stmmac_dma_cfg *dma_cfg,
+			       u32 dma_tx, u32 dma_rx, int atds)
 {
 	u32 value = readl(ioaddr + DMA_BUS_MODE);
 
@@ -99,20 +100,20 @@ static void dwmac1000_dma_init(void __iomem *ioaddr, int pbl, int fb, int mb,
 	 */
 	value |= DMA_BUS_MODE_MAXPBL;
 	value &= ~DMA_BUS_MODE_PBL_MASK;
-	value |= (pbl << DMA_BUS_MODE_PBL_SHIFT);
+	value |= (dma_cfg->pbl << DMA_BUS_MODE_PBL_SHIFT);
 
 	/* Set the Fixed burst mode */
-	if (fb)
+	if (dma_cfg->fixed_burst)
 		value |= DMA_BUS_MODE_FB;
 
 	/* Mixed Burst has no effect when fb is set */
-	if (mb)
+	if (dma_cfg->mixed_burst)
 		value |= DMA_BUS_MODE_MB;
 
 	if (atds)
 		value |= DMA_BUS_MODE_ATDS;
 
-	if (aal)
+	if (dma_cfg->aal)
 		value |= DMA_BUS_MODE_AAL;
 
 	writel(value, ioaddr + DMA_BUS_MODE);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
index 61f54c99a7de..e5664da382f3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
@@ -32,11 +32,12 @@
 #include "dwmac100.h"
 #include "dwmac_dma.h"
 
-static void dwmac100_dma_init(void __iomem *ioaddr, int pbl, int fb, int mb,
-			      int aal, u32 dma_tx, u32 dma_rx, int atds)
+static void dwmac100_dma_init(void __iomem *ioaddr,
+			      struct stmmac_dma_cfg *dma_cfg,
+			      u32 dma_tx, u32 dma_rx, int atds)
 {
 	/* Enable Application Access by writing to DMA CSR0 */
-	writel(DMA_BUS_MODE_DEFAULT | (pbl << DMA_BUS_MODE_PBL_SHIFT),
+	writel(DMA_BUS_MODE_DEFAULT | (dma_cfg->pbl << DMA_BUS_MODE_PBL_SHIFT),
 	       ioaddr + DMA_BUS_MODE);
 
 	/* Mask interrupts by writing to CSR7 */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index 577316de6ba8..0946546d6dcd 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -97,27 +97,29 @@ static void dwmac4_dma_init_channel(void __iomem *ioaddr, int pbl,
 	writel(dma_rx_phy, ioaddr + DMA_CHAN_RX_BASE_ADDR(channel));
 }
 
-static void dwmac4_dma_init(void __iomem *ioaddr, int pbl, int fb, int mb,
-			    int aal, u32 dma_tx, u32 dma_rx, int atds)
+static void dwmac4_dma_init(void __iomem *ioaddr,
+			    struct stmmac_dma_cfg *dma_cfg,
+			    u32 dma_tx, u32 dma_rx, int atds)
 {
 	u32 value = readl(ioaddr + DMA_SYS_BUS_MODE);
 	int i;
 
 	/* Set the Fixed burst mode */
-	if (fb)
+	if (dma_cfg->fixed_burst)
 		value |= DMA_SYS_BUS_FB;
 
 	/* Mixed Burst has no effect when fb is set */
-	if (mb)
+	if (dma_cfg->mixed_burst)
 		value |= DMA_SYS_BUS_MB;
 
-	if (aal)
+	if (dma_cfg->aal)
 		value |= DMA_SYS_BUS_AAL;
 
 	writel(value, ioaddr + DMA_SYS_BUS_MODE);
 
 	for (i = 0; i < DMA_CHANNEL_NB_MAX; i++)
-		dwmac4_dma_init_channel(ioaddr, pbl, dma_tx, dma_rx, i);
+		dwmac4_dma_init_channel(ioaddr, dma_cfg->pbl,
+					dma_tx, dma_rx, i);
 }
 
 static void _dwmac4_dump_dma_regs(void __iomem *ioaddr, u32 channel)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 04f88df7da49..27a03f7ee095 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1594,11 +1594,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv)
 		return ret;
 	}
 
-	priv->hw->dma->init(priv->ioaddr,
-			    priv->plat->dma_cfg->pbl,
-			    priv->plat->dma_cfg->fixed_burst,
-			    priv->plat->dma_cfg->mixed_burst,
-			    priv->plat->dma_cfg->aal,
+	priv->hw->dma->init(priv->ioaddr, priv->plat->dma_cfg,
 			    priv->dma_tx_phy, priv->dma_rx_phy, atds);
 
 	if (priv->synopsys_id >= DWMAC_CORE_4_00) {
-- 
2.1.4

^ permalink raw reply related

* [PATCH 4/6] net: stmmac: dwmac1000: fix define DMA_BUS_MODE_RPBL_MASK
From: Niklas Cassel @ 2016-12-02 14:11 UTC (permalink / raw)
  To: Giuseppe Cavallaro, Alexandre Torgue; +Cc: Niklas Cassel, netdev, linux-kernel
In-Reply-To: <1480687910-9690-1-git-send-email-niklass@axis.com>

From: Niklas Cassel <niklas.cassel@axis.com>

DMA_BUS_MODE_RPBL_MASK is really 6 bits,
just like DMA_BUS_MODE_PBL_MASK.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac1000.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
index ff3e5ab39bd0..52b9407a8a39 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
@@ -225,7 +225,7 @@ enum rx_tx_priority_ratio {
 
 #define DMA_BUS_MODE_FB		0x00010000	/* Fixed burst */
 #define DMA_BUS_MODE_MB		0x04000000	/* Mixed burst */
-#define DMA_BUS_MODE_RPBL_MASK	0x003e0000	/* Rx-Programmable Burst Len */
+#define DMA_BUS_MODE_RPBL_MASK	0x007e0000	/* Rx-Programmable Burst Len */
 #define DMA_BUS_MODE_RPBL_SHIFT	17
 #define DMA_BUS_MODE_USP	0x00800000
 #define DMA_BUS_MODE_MAXPBL	0x01000000
-- 
2.1.4

^ permalink raw reply related

* [PATCH 3/6] net: stmmac: stmmac_platform: fix parsing of DT binding
From: Niklas Cassel @ 2016-12-02 14:11 UTC (permalink / raw)
  To: Giuseppe Cavallaro, Alexandre Torgue; +Cc: Niklas Cassel, netdev, linux-kernel
In-Reply-To: <1480687910-9690-1-git-send-email-niklass@axis.com>

From: Niklas Cassel <niklas.cassel@axis.com>

commit 64c3b252e9fc ("net: stmmac: fixed the pbl setting with DT")
changed the parsing of the DT binding.

Before 64c3b252e9fc, snps,fixed-burst and snps,mixed-burst were parsed
regardless if the property snps,pbl existed or not.
After the commit, fixed burst and mixed burst are only parsed if
snps,pbl exists. Now when snps,aal has been added, it too is only
parsed if snps,pbl exists.

Since the DT binding does not specify that fixed burst, mixed burst
or aal depend on snps,pbl being specified, undo changes introduced
by 64c3b252e9fc.

The issue commit 64c3b252e9fc ("net: stmmac: fixed the pbl setting with
DT") tries to address is solved in another way:
The databook specifies that all values other than
1, 2, 4, 8, 16, or 32 results in undefined behavior,
so snps,pbl = <0> is invalid.

If pbl is 0 after parsing, set pbl to DEFAULT_DMA_PBL.
This handles the case where the property is omitted, and also handles
the case where the property is specified without any data.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  3 +++
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  | 27 +++++++++++-----------
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 27a03f7ee095..9ba2eb4c22fe 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1585,6 +1585,9 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv)
 		return -EINVAL;
 	}
 
+	if (!priv->plat->dma_cfg->pbl)
+		priv->plat->dma_cfg->pbl = DEFAULT_DMA_PBL;
+
 	if (priv->extend_desc && (priv->mode == STMMAC_RING_MODE))
 		atds = 1;
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index b46c892c7079..05b8c33effd5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -303,21 +303,20 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
 		plat->force_sf_dma_mode = 1;
 	}
 
-	if (of_find_property(np, "snps,pbl", NULL)) {
-		dma_cfg = devm_kzalloc(&pdev->dev, sizeof(*dma_cfg),
-				       GFP_KERNEL);
-		if (!dma_cfg) {
-			of_node_put(plat->phy_node);
-			return ERR_PTR(-ENOMEM);
-		}
-		plat->dma_cfg = dma_cfg;
-		of_property_read_u32(np, "snps,pbl", &dma_cfg->pbl);
-		dma_cfg->aal = of_property_read_bool(np, "snps,aal");
-		dma_cfg->fixed_burst =
-			of_property_read_bool(np, "snps,fixed-burst");
-		dma_cfg->mixed_burst =
-			of_property_read_bool(np, "snps,mixed-burst");
+	dma_cfg = devm_kzalloc(&pdev->dev, sizeof(*dma_cfg),
+			       GFP_KERNEL);
+	if (!dma_cfg) {
+		of_node_put(plat->phy_node);
+		return ERR_PTR(-ENOMEM);
 	}
+	plat->dma_cfg = dma_cfg;
+
+	of_property_read_u32(np, "snps,pbl", &dma_cfg->pbl);
+
+	dma_cfg->aal = of_property_read_bool(np, "snps,aal");
+	dma_cfg->fixed_burst = of_property_read_bool(np, "snps,fixed-burst");
+	dma_cfg->mixed_burst = of_property_read_bool(np, "snps,mixed-burst");
+
 	plat->force_thresh_dma_mode = of_property_read_bool(np, "snps,force_thresh_dma_mode");
 	if (plat->force_thresh_dma_mode) {
 		plat->force_sf_dma_mode = 0;
-- 
2.1.4

^ permalink raw reply related

* [PATCH 5/6] net: stmmac: add support for independent DMA pbl for tx/rx
From: Niklas Cassel @ 2016-12-02 14:11 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Jonathan Corbet, Giuseppe Cavallaro,
	Alexandre Torgue, David S. Miller, Phil Reid, Eric Engestrom,
	Joachim Eastwood, Gabriel Fernandez, Vincent Palatin
  Cc: Niklas Cassel, netdev, devicetree, linux-kernel, linux-doc
In-Reply-To: <1480687910-9690-1-git-send-email-niklass@axis.com>

From: Niklas Cassel <niklas.cassel@axis.com>

GMAC and newer supports independent programmable burst lengths for
DMA tx/rx. Add new optional devicetree properties representing this.

To be backwards compatible, snps,pbl will still be valid, but
snps,txpbl/snps,rxpbl will override the value in snps,pbl if set.

If the IP is synthesized to use the AXI interface, there is a register
and a matching DT property inside the optional stmmac-axi-config DT node
for controlling burst lengths, named snps,blen.
However, using this register, it is not possible to control tx and rx
independently. Also, this register is not available if the IP was
synthesized with, e.g., the AHB interface.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
---
 Documentation/devicetree/bindings/net/stmmac.txt      |  6 +++++-
 Documentation/networking/stmmac.txt                   | 19 +++++++++++++------
 drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c   | 12 ++++++------
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c      | 12 +++++++-----
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c |  2 ++
 include/linux/stmmac.h                                |  2 ++
 6 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt
index b95ff998ba73..8080038ff1b2 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -34,7 +34,11 @@ Optional properties:
   platforms.
 - tx-fifo-depth: See ethernet.txt file in the same directory
 - rx-fifo-depth: See ethernet.txt file in the same directory
-- snps,pbl		Programmable Burst Length
+- snps,pbl		Programmable Burst Length (tx and rx)
+- snps,txpbl		Tx Programmable Burst Length. Only for GMAC and newer.
+			If set, DMA tx will use this value rather than snps,pbl.
+- snps,rxpbl		Rx Programmable Burst Length. Only for GMAC and newer.
+			If set, DMA rx will use this value rather than snps,pbl.
 - snps,aal		Address-Aligned Beats
 - snps,fixed-burst	Program the DMA to use the fixed burst mode
 - snps,mixed-burst	Program the DMA to use the mixed burst mode
diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index e226f8925c9e..82c8e496b4bb 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -154,7 +154,8 @@ Where:
    o pbl: the Programmable Burst Length is maximum number of beats to
        be transferred in one DMA transaction.
        GMAC also enables the 4xPBL by default.
-   o fixed_burst/mixed_burst/burst_len
+   o txpbl/rxpbl: GMAC and newer supports independent DMA pbl for tx/rx.
+   o fixed_burst/mixed_burst/aal
  o clk_csr: fixed CSR Clock range selection.
  o has_gmac: uses the GMAC core.
  o enh_desc: if sets the MAC will use the enhanced descriptor structure.
@@ -206,16 +207,22 @@ tuned according to the HW capabilities.
 
 struct stmmac_dma_cfg {
 	int pbl;
+	int txpbl;
+	int rxpbl;
 	int fixed_burst;
-	int burst_len_supported;
+	int mixed_burst;
+	bool aal;
 };
 
 Where:
- o pbl: Programmable Burst Length
+ o pbl: Programmable Burst Length (tx and rx)
+ o txpbl: Transmit Programmable Burst Length. Only for GMAC and newer.
+	 If set, DMA tx will use this value rather than pbl.
+ o rxpbl: Receive Programmable Burst Length. Only for GMAC and newer.
+	 If set, DMA rx will use this value rather than pbl.
  o fixed_burst: program the DMA to use the fixed burst mode
- o burst_len: this is the value we put in the register
-	      supported values are provided as macros in
-	      linux/stmmac.h header file.
+ o mixed_burst: program the DMA to use the mixed burst mode
+ o aal: Address-Aligned Beats
 
 ---
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
index 01d0d0f315e5..1dd34fb4c1a9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
@@ -87,20 +87,20 @@ static void dwmac1000_dma_init(void __iomem *ioaddr,
 			       u32 dma_tx, u32 dma_rx, int atds)
 {
 	u32 value = readl(ioaddr + DMA_BUS_MODE);
+	int txpbl = dma_cfg->txpbl ?: dma_cfg->pbl;
+	int rxpbl = dma_cfg->rxpbl ?: dma_cfg->pbl;
 
 	/*
 	 * Set the DMA PBL (Programmable Burst Length) mode.
 	 *
 	 * Note: before stmmac core 3.50 this mode bit was 4xPBL, and
 	 * post 3.5 mode bit acts as 8*PBL.
-	 *
-	 * This configuration doesn't take care about the Separate PBL
-	 * so only the bits: 13-8 are programmed with the PBL passed from the
-	 * platform.
 	 */
 	value |= DMA_BUS_MODE_MAXPBL;
-	value &= ~DMA_BUS_MODE_PBL_MASK;
-	value |= (dma_cfg->pbl << DMA_BUS_MODE_PBL_SHIFT);
+	value |= DMA_BUS_MODE_USP;
+	value &= ~(DMA_BUS_MODE_PBL_MASK | DMA_BUS_MODE_RPBL_MASK);
+	value |= (txpbl << DMA_BUS_MODE_PBL_SHIFT);
+	value |= (rxpbl << DMA_BUS_MODE_RPBL_SHIFT);
 
 	/* Set the Fixed burst mode */
 	if (dma_cfg->fixed_burst)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index 0946546d6dcd..0bf47825bfeb 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -69,11 +69,14 @@ static void dwmac4_dma_axi(void __iomem *ioaddr, struct stmmac_axi *axi)
 	writel(value, ioaddr + DMA_SYS_BUS_MODE);
 }
 
-static void dwmac4_dma_init_channel(void __iomem *ioaddr, int pbl,
+static void dwmac4_dma_init_channel(void __iomem *ioaddr,
+				    struct stmmac_dma_cfg *dma_cfg,
 				    u32 dma_tx_phy, u32 dma_rx_phy,
 				    u32 channel)
 {
 	u32 value;
+	int txpbl = dma_cfg->txpbl ?: dma_cfg->pbl;
+	int rxpbl = dma_cfg->rxpbl ?: dma_cfg->pbl;
 
 	/* set PBL for each channels. Currently we affect same configuration
 	 * on each channel
@@ -83,11 +86,11 @@ static void dwmac4_dma_init_channel(void __iomem *ioaddr, int pbl,
 	writel(value, ioaddr + DMA_CHAN_CONTROL(channel));
 
 	value = readl(ioaddr + DMA_CHAN_TX_CONTROL(channel));
-	value = value | (pbl << DMA_BUS_MODE_PBL_SHIFT);
+	value = value | (txpbl << DMA_BUS_MODE_PBL_SHIFT);
 	writel(value, ioaddr + DMA_CHAN_TX_CONTROL(channel));
 
 	value = readl(ioaddr + DMA_CHAN_RX_CONTROL(channel));
-	value = value | (pbl << DMA_BUS_MODE_RPBL_SHIFT);
+	value = value | (rxpbl << DMA_BUS_MODE_RPBL_SHIFT);
 	writel(value, ioaddr + DMA_CHAN_RX_CONTROL(channel));
 
 	/* Mask interrupts by writing to CSR7 */
@@ -118,8 +121,7 @@ static void dwmac4_dma_init(void __iomem *ioaddr,
 	writel(value, ioaddr + DMA_SYS_BUS_MODE);
 
 	for (i = 0; i < DMA_CHANNEL_NB_MAX; i++)
-		dwmac4_dma_init_channel(ioaddr, dma_cfg->pbl,
-					dma_tx, dma_rx, i);
+		dwmac4_dma_init_channel(ioaddr, dma_cfg, dma_tx, dma_rx, i);
 }
 
 static void _dwmac4_dump_dma_regs(void __iomem *ioaddr, u32 channel)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 05b8c33effd5..29faff42a5ea 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -312,6 +312,8 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
 	plat->dma_cfg = dma_cfg;
 
 	of_property_read_u32(np, "snps,pbl", &dma_cfg->pbl);
+	of_property_read_u32(np, "snps,txpbl", &dma_cfg->txpbl);
+	of_property_read_u32(np, "snps,rxpbl", &dma_cfg->rxpbl);
 
 	dma_cfg->aal = of_property_read_bool(np, "snps,aal");
 	dma_cfg->fixed_burst = of_property_read_bool(np, "snps,fixed-burst");
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 705840e0438f..c8a7c6b278da 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -88,6 +88,8 @@ struct stmmac_mdio_bus_data {
 
 struct stmmac_dma_cfg {
 	int pbl;
+	int txpbl;
+	int rxpbl;
 	int fixed_burst;
 	int mixed_burst;
 	bool aal;
-- 
2.1.4

^ permalink raw reply related

* [PATCH 6/6] net: smmac: allow configuring lower pbl values
From: Niklas Cassel @ 2016-12-02 14:13 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Jonathan Corbet, Giuseppe Cavallaro,
	Alexandre Torgue, David S. Miller, Phil Reid, Eric Engestrom,
	Vincent Palatin, Andreas Färber, Gabriel Fernandez,
	Joachim Eastwood
  Cc: Niklas Cassel, netdev, devicetree, linux-kernel, linux-doc

From: Niklas Cassel <niklas.cassel@axis.com>

The driver currently always sets the PBLx8/PBLx4 bit, which means that
the pbl values configured via the pbl/txpbl/rxpbl DT properties are
always multiplied by 8/4 in the hardware.

In order to allow the DT to configure lower pbl values, while at the
same time not changing behavior of any existing device trees using the
pbl/txpbl/rxpbl settings, add a property to disable the multiplication
of the pbl by 8/4 in the hardware.

Suggested-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
---
 Documentation/devicetree/bindings/net/stmmac.txt      | 2 ++
 Documentation/networking/stmmac.txt                   | 5 ++++-
 drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c   | 3 ++-
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c      | 3 ++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c      | 2 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 1 +
 include/linux/stmmac.h                                | 1 +
 7 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt
index 8080038ff1b2..128da752fec9 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -39,6 +39,8 @@ Optional properties:
 			If set, DMA tx will use this value rather than snps,pbl.
 - snps,rxpbl		Rx Programmable Burst Length. Only for GMAC and newer.
 			If set, DMA rx will use this value rather than snps,pbl.
+- snps,no-pbl-x8	Don't multiply the pbl/txpbl/rxpbl values by 8.
+			For core rev < 3.50, don't multiply the values by 4.
 - snps,aal		Address-Aligned Beats
 - snps,fixed-burst	Program the DMA to use the fixed burst mode
 - snps,mixed-burst	Program the DMA to use the mixed burst mode
diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index 82c8e496b4bb..d3376c5fbcf0 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -153,8 +153,9 @@ Where:
  o dma_cfg: internal DMA parameters
    o pbl: the Programmable Burst Length is maximum number of beats to
        be transferred in one DMA transaction.
-       GMAC also enables the 4xPBL by default.
+       GMAC also enables the 4xPBL by default. (8xPBL for GMAC 3.50 and newer)
    o txpbl/rxpbl: GMAC and newer supports independent DMA pbl for tx/rx.
+   o pblx8: Enable 8xPBL (4xPBL for core rev < 3.50). Enabled by default.
    o fixed_burst/mixed_burst/aal
  o clk_csr: fixed CSR Clock range selection.
  o has_gmac: uses the GMAC core.
@@ -209,6 +210,7 @@ struct stmmac_dma_cfg {
 	int pbl;
 	int txpbl;
 	int rxpbl;
+	bool pblx8;
 	int fixed_burst;
 	int mixed_burst;
 	bool aal;
@@ -220,6 +222,7 @@ Where:
 	 If set, DMA tx will use this value rather than pbl.
  o rxpbl: Receive Programmable Burst Length. Only for GMAC and newer.
 	 If set, DMA rx will use this value rather than pbl.
+ o pblx8: Enable 8xPBL (4xPBL for core rev < 3.50). Enabled by default.
  o fixed_burst: program the DMA to use the fixed burst mode
  o mixed_burst: program the DMA to use the mixed burst mode
  o aal: Address-Aligned Beats
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
index 1dd34fb4c1a9..1d313af647b4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
@@ -96,7 +96,8 @@ static void dwmac1000_dma_init(void __iomem *ioaddr,
 	 * Note: before stmmac core 3.50 this mode bit was 4xPBL, and
 	 * post 3.5 mode bit acts as 8*PBL.
 	 */
-	value |= DMA_BUS_MODE_MAXPBL;
+	if (dma_cfg->pblx8)
+		value |= DMA_BUS_MODE_MAXPBL;
 	value |= DMA_BUS_MODE_USP;
 	value &= ~(DMA_BUS_MODE_PBL_MASK | DMA_BUS_MODE_RPBL_MASK);
 	value |= (txpbl << DMA_BUS_MODE_PBL_SHIFT);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
index 0bf47825bfeb..0f7110d19a4a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c
@@ -82,7 +82,8 @@ static void dwmac4_dma_init_channel(void __iomem *ioaddr,
 	 * on each channel
 	 */
 	value = readl(ioaddr + DMA_CHAN_CONTROL(channel));
-	value = value | DMA_BUS_MODE_PBL;
+	if (dma_cfg->pblx8)
+		value = value | DMA_BUS_MODE_PBL;
 	writel(value, ioaddr + DMA_CHAN_CONTROL(channel));
 
 	value = readl(ioaddr + DMA_CHAN_TX_CONTROL(channel));
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
index 56c8a2342c14..a2831773431a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -81,6 +81,7 @@ static void stmmac_default_data(struct plat_stmmacenet_data *plat)
 	plat->mdio_bus_data->phy_mask = 0;
 
 	plat->dma_cfg->pbl = 32;
+	plat->dma_cfg->pblx8 = true;
 	/* TODO: AXI */
 
 	/* Set default value for multicast hash bins */
@@ -115,6 +116,7 @@ static int quark_default_data(struct plat_stmmacenet_data *plat,
 	plat->mdio_bus_data->phy_mask = 0;
 
 	plat->dma_cfg->pbl = 16;
+	plat->dma_cfg->pblx8 = true;
 	plat->dma_cfg->fixed_burst = 1;
 	/* AXI (TODO) */
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 29faff42a5ea..3a713862f0a3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -314,6 +314,7 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac)
 	of_property_read_u32(np, "snps,pbl", &dma_cfg->pbl);
 	of_property_read_u32(np, "snps,txpbl", &dma_cfg->txpbl);
 	of_property_read_u32(np, "snps,rxpbl", &dma_cfg->rxpbl);
+	dma_cfg->pblx8 = !of_property_read_bool(np, "snps,no-pbl-x8");
 
 	dma_cfg->aal = of_property_read_bool(np, "snps,aal");
 	dma_cfg->fixed_burst = of_property_read_bool(np, "snps,fixed-burst");
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index c8a7c6b278da..05a8243017b0 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -90,6 +90,7 @@ struct stmmac_dma_cfg {
 	int pbl;
 	int txpbl;
 	int rxpbl;
+	bool pblx8;
 	int fixed_burst;
 	int mixed_burst;
 	bool aal;
-- 
2.1.4

^ permalink raw reply related

* [PATCH] net: phy: dp83848: Support ethernet pause frames
From: Jesper Nilsson @ 2016-12-02 14:22 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Roger Quadros, Andrew F. Davis, David S. Miller, Dan Murphy,
	netdev, linux-kernel

According to the documentation, the PHYs supported by this driver
can also support pause frames. Announce this to be so.
Tested with a TI83822I.

Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
---
 drivers/net/phy/dp83848.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/dp83848.c b/drivers/net/phy/dp83848.c
index 800b39f..6e4117f 100644
--- a/drivers/net/phy/dp83848.c
+++ b/drivers/net/phy/dp83848.c
@@ -88,7 +88,8 @@ MODULE_DEVICE_TABLE(mdio, dp83848_tbl);
 		.phy_id		= _id,				\
 		.phy_id_mask	= 0xfffffff0,			\
 		.name		= _name,			\
-		.features	= PHY_BASIC_FEATURES,		\
+		.features	= (PHY_BASIC_FEATURES |		\
+			SUPPORTED_Pause | SUPPORTED_Asym_Pause),\
 		.flags		= PHY_HAS_INTERRUPT,		\
 								\
 		.soft_reset	= genphy_soft_reset,		\
-- 
2.1.4


/^JN - Jesper Nilsson
-- 
               Jesper Nilsson -- jesper.nilsson@axis.com

^ permalink raw reply related

* Re: [WIP] net+mlx4: auto doorbell
From: Eric Dumazet @ 2016-12-02 14:23 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Saeed Mahameed, Rick Jones, Linux Netdev List, Saeed Mahameed,
	Tariq Toukan
In-Reply-To: <1480611857.18162.319.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, 2016-12-01 at 09:04 -0800, Eric Dumazet wrote:
> On Thu, 2016-12-01 at 17:04 +0100, Jesper Dangaard Brouer wrote:
> 
> > I think you misunderstood my concept[1].  I don't want to stop the
> > queue. The new __QUEUE_STATE_FLUSH_NEEDED does not stop the queue, is
> > it just indicating that someone need to flush/ring-doorbell.  Maybe it
> > need another name, because it also indicate that the driver can see
> > that its TX queue is so busy that we don't need to call it immediately.
> > The qdisc layer can then choose to enqueue instead if doing direct xmit.
> 
> But driver ndo_start_xmit() does not have a pointer to qdisc.
> 
> Also the concept of 'queue busy' just because we queued one packet is a
> bit flaky.
> 
> > 
> > When qdisc layer or trafgen/af_packet see this indication it knows it
> > should/must flush the queue when it don't have more work left.  Perhaps
> > through net_tx_action(), by registering itself and e.g. if qdisc_run()
> > is called and queue is empty then check if queue needs a flush. I would
> > also allow driver to flush and clear this bit.
> 
> net_tx_action() is not normally called, unless BQL limit is hit and/or
> some qdiscs with throttling (HTB, TBF, FQ, ...)
> 
> > 
> > I just see it as an extension of your solution, as we still need the
> > driver to figure out then the doorbell/flush can be delayed.
> > p.s. don't be discouraged by this feedback, I'm just very excited and
> > happy that your are working on a solution in this area. As this is a
> > problem area that I've not been able to solve myself for the last
> > approx 2 years. Keep up the good work!
> 
> Do not worry, I appreciate the feedbacks ;)
> 
> BTW, if you are doing tests on mlx4 40Gbit, would you check the
> following quick/dirty hack, using lots of low-rate flows ?
> 
> mlx4 has really hard time to transmit small TSO packets (2 or 3 MSS)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 12ea3405f442..96940666abd3 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2631,6 +2631,11 @@ static void mlx4_en_del_vxlan_port(struct  net_device *dev,
>         queue_work(priv->mdev->workqueue, &priv->vxlan_del_task);
>  }
>  
> +static int mlx4_gso_segs_min = 4; /* TSO packets with less than 4 segments are segmented */
> +module_param_named(mlx4_gso_segs_min, mlx4_gso_segs_min, uint, 0644);
> +MODULE_PARM_DESC(mlx4_gso_segs_min, "threshold for software segmentation of small TSO packets");
> +
> +
>  static netdev_features_t mlx4_en_features_check(struct sk_buff *skb,
>                                                 struct net_device *dev,
>                                                 netdev_features_t features)
> @@ -2651,6 +2656,8 @@ static netdev_features_t mlx4_en_features_check(struct sk_buff *skb,
>                     (udp_hdr(skb)->dest != priv->vxlan_port))
>                         features &= ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
>         }
> +       if (skb_is_gso(skb) && skb_shinfo(skb)->gso_segs < mlx4_gso_segs_min)
> +               features &= NETIF_F_GSO_MASK;


Sorry, stupid typo.
This should be "features &= ~NETIF_F_GSO_MASK;" of course

>  
>         return features;
>  }
> 
> 
> 
> 

^ permalink raw reply

* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Alexandre Torgue @ 2016-12-02 14:26 UTC (permalink / raw)
  To: Giuseppe CAVALLARO, Pavel Machek; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <c4748c8f-ae4b-9bbc-7d20-9c7b7ce433c8@st.com>

Hi Pavel and Peppe,

On 12/02/2016 02:51 PM, Giuseppe CAVALLARO wrote:
> On 12/2/2016 1:32 PM, Pavel Machek wrote:
>> Hi!
>>
>>>> Well, if you have a workload that sends and receive packets, it tends
>>>> to work ok, as you do tx_clean() in stmmac_poll(). My workload is not
>>>> like that -- it is "sending packets at 3MB/sec, receiving none". So
>>>> the stmmac_tx_timer() is rescheduled and rescheduled and rescheduled,
>>>> and then we run out of transmit descriptors, and then 40msec passes,
>>>> and then we clean them. Bad.
>>>>
>>>> And that's why low-res timers do not cut it.
>>>
>>> in that case, I expect that the tuning of the driver could help you.
>>> I mean, by using ethtool, it could be enough to set the IC bit on all
>>> the descriptors. You should touch the tx_coal_frames.
>>>
>>> Then you can use ethtool -S to monitor the status.
>>
>> Yes, I did something similar. Unfortnunately that meant crash within
>> minutes, at least with 4.4 kernel. (If you know what was fixed between
>> 4.4 and 4.9, that would be helpful).
>
> 4.4 has no GMAC4 support.
> Alex, do you remember any patches to fix that?

No sorry Peppe.

Pavel,

Sorry but I'm a little bit confused. I'm dropped in some mails without 
historic. I see cleanup, coalescence issue and TSO question.
What is your main issue? Are you working on gmac4 or 3.x ?
Can you refresh a little bit the story please ?

Regards
Alex
>
>>> We had experimented this tuning on STB IP where just datagrams
>>> had to send externally. To be honest, although we had seen
>>> better results w/o any timer, we kept this approach enabled
>>> because the timer was fast enough to cover our tests on SH4 boxes.
>>
>> Please reply to David, and explain how it is supposed to
>> work... because right now it does not. 40 msec delays are not
>> acceptable in default configuration.
>
> I mean, that on UP and SMP system this schema helped
> to improve the performance saving CPU on my side and this has been
> tested since a long time (~4 years).
> I tested something similar to yours where unidirectional traffic
> with limited throughput was needed and I can confirm you that
> tuning/removing coalesce parameters this helped. The tuning I decided
> to keep in the driver was suitable in several user cases and if now
> you have problems or you want to review it I can just confirm that
> there are no problems on my side. If you want to simply the logic
> around the tx process and remove timer on official driver I can accept
> that. I will just ask you uto double check if the throughput and
> CPU usage when request max throughput (better if on GiGa setup) has
> no regressions.
> Otherwise we could start thinking about adaptive schema if feasible.
>
>>>>> In the ring, some descriptors can raise the irq (according to a
>>>>> threshold) and set the IC bit. In this path, the NAPI  poll will be
>>>>> scheduled.
>>>>
>>>> Not NAPI poll but stmmac_tx_timer(), right?
>>>
>>> in the xmit according the the threshold the timer is started or the
>>> interrupt is set inside the descriptor.
>>> Then stmmac_tx_clean will be always called and, if you see the flow,
>>> no irqlock protection is needed!
>>
>> Agreed that no irqlock protection is needed if we rely on napi and
>> timers.
>
> ok
>
>>
>>>>> Concerning the lock protection, we had reviewed long time ago and
>>>>> IIRC, no raise condition should be present. Open to review it,
>>>>> again!
>> ...
>>>> There's nothing that protect stmmac_poll() from running concurently
>>>> with stmmac_dma_interrupt(), right?
>>>
>>> This is not necessary.
>>
>> dma_interrupt accesses shared priv->xstats; variables are of type
>> unsigned long (not atomic_t), yet they are accesssed from interrupt
>> context and from stmmac_ethtool without any locking. That can result
>> in broken statistics AFAICT.
>
> ok we can check this and welcome patches and I'd prefer to
> remove xstats from critical part of the code like ISR (that
> comes from old story of the driver).
>
>>
>> Please take another look. As far as I can tell, you can have two cpus
>> at #1 and #2 in the code, at the same time. It looks like napi_... has
>> some atomic opertions inside so that looks safe at the first look. But
>> I'm not sure if they also include enough memory barriers to make it
>> safe...?
>
> Although I have never reproduced related issues on SMP platforms
> due to reordering of memory operations but, as said above, welcome
> review on this especially if you are seeing problems when manage NAPI.
>
> FYI, the only memory barrier you will see in the driver are about the
> OWN_BIT setting till now.
>
>> static void stmmac_dma_interrupt(struct stmmac_priv *priv)
>> {
>> ...
>>         status = priv->hw->dma->dma_interrupt(priv->ioaddr,
>> &priv->xstats);
>>         if (likely((status & handle_rx)) || (status & handle_tx)) {
>>                 if (likely(napi_schedule_prep(&priv->napi))) {
>> #1
>>                         stmmac_disable_dma_irq(priv);
>>                         __napi_schedule(&priv->napi);
>>                 }
>>         }
>>
>>
>> static int stmmac_poll(struct napi_struct *napi, int budget)
>> {
>>         struct stmmac_priv *priv = container_of(napi, struct
>> stmmac_priv, napi);
>>         int work_done = 0;
>>
>>         priv->xstats.napi_poll++;
>>         stmmac_tx_clean(priv);
>>
>>         work_done = stmmac_rx(priv, budget);
>>     if (work_done < budget) {
>>                 napi_complete(napi);
>> #2
>>             stmmac_enable_dma_irq(priv);
>>         }
>
> hmm, I have to check (and refresh my memory) but the driver
> uses the napi_schedule_prep.
>
> Regards
>
> Peppe
>
>>         return work_done;
>> }
>>
>>
>> Best regards,
>>                                     Pavel
>>
>

^ permalink raw reply

* Re: [PATCH] net: phy: dp83848: Support ethernet pause frames
From: Andrew F. Davis @ 2016-12-02 14:35 UTC (permalink / raw)
  To: Jesper Nilsson, Florian Fainelli
  Cc: Roger Quadros, David S. Miller, Dan Murphy, netdev, linux-kernel
In-Reply-To: <20161202142203.GY19016@axis.com>

On 12/02/2016 08:22 AM, Jesper Nilsson wrote:
> According to the documentation, the PHYs supported by this driver
> can also support pause frames. Announce this to be so.
> Tested with a TI83822I.
> 

Looks like all PHYs supported by this driver do, so:

Acked-by: Andrew F. Davis <afd@ti.com>

> Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
> ---
>  drivers/net/phy/dp83848.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/dp83848.c b/drivers/net/phy/dp83848.c
> index 800b39f..6e4117f 100644
> --- a/drivers/net/phy/dp83848.c
> +++ b/drivers/net/phy/dp83848.c
> @@ -88,7 +88,8 @@ MODULE_DEVICE_TABLE(mdio, dp83848_tbl);
>  		.phy_id		= _id,				\
>  		.phy_id_mask	= 0xfffffff0,			\
>  		.name		= _name,			\
> -		.features	= PHY_BASIC_FEATURES,		\
> +		.features	= (PHY_BASIC_FEATURES |		\
> +			SUPPORTED_Pause | SUPPORTED_Asym_Pause),\

Aligning these may look nicer though.

>  		.flags		= PHY_HAS_INTERRUPT,		\
>  								\
>  		.soft_reset	= genphy_soft_reset,		\
> 

^ permalink raw reply

* Re: Initial thoughts on TXDP
From: Edward Cree @ 2016-12-02 14:36 UTC (permalink / raw)
  To: Tom Herbert, Hannes Frederic Sowa
  Cc: Florian Westphal, Linux Kernel Network Developers,
	Jesper Dangaard Brouer
In-Reply-To: <CALx6S350eCYS63dGiR+X+nVvqF_uGJop9Z_m7SmZf9QXr-rrfg@mail.gmail.com>

On 01/12/16 23:46, Tom Herbert wrote:
> The only time we
> _really_ to allocate an skbuf is when we need to put the packet onto a
> queue. All the other use cases are really just to pass a structure
> containing a packet from function to function. For that purpose we
> should be able to just pass a much smaller structure in a stack
> argument and only allocate an skbuff when we need to enqueue. In cases
> where we don't ever queue a packet we might never need to allocate any
> skbuff
Now this intrigues me, because one of the objections to bundling (vs GRO)
was the memory usage of all those SKBs.  IIRC we already do a 'GRO-like'
coalescing when packets reach a TCP socket anyway (or at least in some
cases, not sure if all the different ways we can enqueue a TCP packet for
RX do it), but if we could carry the packets from NIC to socket without
SKBs, doing so in lists rather than one-at-a-time wouldn't cost any extra
memory (the packet-pages are all already allocated on the NIC RX ring).
Possibly combine the two, so that rather than having potentially four
versions of each function (skb, skbundle, void*, void* bundle) you just
have the two 'ends'.

-Ed

^ permalink raw reply

* Re: [PATCH net v2] tcp: warn on bogus MSS and try to amend it
From: Eric Dumazet @ 2016-12-02 14:45 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev, Jon Maxwell, Alex Sidorenko, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, tlfalcon,
	Brian King, davem
In-Reply-To: <ef61092947105261f8321a67f726a7fdf9b3c14e.1480675556.git.marcelo.leitner@gmail.com>

On Fri, 2016-12-02 at 08:55 -0200, Marcelo Ricardo Leitner wrote:
> There have been some reports lately about TCP connection stalls caused
> by NIC drivers that aren't setting gso_size on aggregated packets on rx
> path. This causes TCP to assume that the MSS is actually the size of the
> aggregated packet, which is invalid.
> 
> Although the proper fix is to be done at each driver, it's often hard
> and cumbersome for one to debug, come to such root cause and report/fix
> it.
> 
> This patch amends this situation in two ways. First, it adds a warning
> on when this situation occurs, so it gives a hint to those trying to
> debug this. It also limit the maximum probed MSS to the adverised MSS,
> as it should never be any higher than that.
> 
> The result is that the connection may not have the best performance ever
> but it shouldn't stall, and the admin will have a hint on what to look
> for.
> 
> Tested with virtio by forcing gso_size to 0.
> 
> Cc: Jonathan Maxwell <jmaxwell37@gmail.com>
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---
> v2: Updated msg as suggested by David.
> 
>  net/ipv4/tcp_input.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index a27b9c0e27c08b4e4aeaff3d0bfdf3ae561ba4d8..fd619eb93749b6de56a41669248b337c051d9fe2 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -144,7 +144,10 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
>  	 */
>  	len = skb_shinfo(skb)->gso_size ? : skb->len;
>  	if (len >= icsk->icsk_ack.rcv_mss) {
> -		icsk->icsk_ack.rcv_mss = len;
> +		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
> +					       tcp_sk(sk)->advmss);
> +		if (icsk->icsk_ack.rcv_mss != len)
> +			pr_warn_once("Driver has suspect GRO implementation, TCP performance may be compromised.\n");
>  	} else {
>  		/* Otherwise, we make more careful check taking into account,
>  		 * that SACKs block is variable.


skb->dev is indeed NULL, but it might be worth getting back the device
using skb->skb_iif maybe ?

^ permalink raw reply

* Re: [PATCH] net: phy: dp83848: Support ethernet pause frames
From: Jesper Nilsson @ 2016-12-02 14:52 UTC (permalink / raw)
  To: Andrew F. Davis
  Cc: Jesper Nilsson, Florian Fainelli, Roger Quadros, David S. Miller,
	Dan Murphy, netdev, linux-kernel
In-Reply-To: <80ee8058-7d6b-ea88-7ec6-20daf50a1fce@ti.com>

On Fri, Dec 02, 2016 at 08:35:23AM -0600, Andrew F. Davis wrote:
> On 12/02/2016 08:22 AM, Jesper Nilsson wrote:
> > According to the documentation, the PHYs supported by this driver
> > can also support pause frames. Announce this to be so.
> > Tested with a TI83822I.
> > 
> 
> Looks like all PHYs supported by this driver do, so:
> 
> Acked-by: Andrew F. Davis <afd@ti.com>
> 
> > Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
> > ---
> >  drivers/net/phy/dp83848.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/phy/dp83848.c b/drivers/net/phy/dp83848.c
> > index 800b39f..6e4117f 100644
> > --- a/drivers/net/phy/dp83848.c
> > +++ b/drivers/net/phy/dp83848.c
> > @@ -88,7 +88,8 @@ MODULE_DEVICE_TABLE(mdio, dp83848_tbl);
> >  		.phy_id		= _id,				\
> >  		.phy_id_mask	= 0xfffffff0,			\
> >  		.name		= _name,			\
> > -		.features	= PHY_BASIC_FEATURES,		\
> > +		.features	= (PHY_BASIC_FEATURES |		\
> > +			SUPPORTED_Pause | SUPPORTED_Asym_Pause),\
> 
> Aligning these may look nicer though.

Agreed, will send a v2.

> >  		.flags		= PHY_HAS_INTERRUPT,		\
> >  								\
> >  		.soft_reset	= genphy_soft_reset,		\
> > 

/^JN - Jesper Nilsson
-- 
               Jesper Nilsson -- jesper.nilsson@axis.com

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox