Netdev List
 help / color / mirror / Atom feed
* [GIT] Networking
From: David Miller @ 2014-12-12 21:02 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


Small follow-up to the main merge pull from the other day:

1) Alexander Duyck's DMA memory barrier patch set.

2) cxgb4 driver fixes from Karen Xie.

3) Add missing export of fixed_phy_register() to modules, from Mark
   Salter.

4) DSA bug fixes from Florian Fainelli.

Please pull, thanks a lot!

The following changes since commit 70e71ca0af244f48a5dcf56dc435243792e3a495:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next (2014-12-11 14:27:06 -0800)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to eea3e8fde92999248863e4de93f325fcf3833a14:

  Merge branch 'gem' (2014-12-12 15:17:32 -0500)

----------------------------------------------------------------

Alexander Duyck (5):
      arch: Cleanup read_barrier_depends() and comments
      arch: Add lightweight memory barriers dma_rmb() and dma_wmb()
      r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
      fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads
      fib_trie: Fix trie balancing issue if new node pushes down existing node

Chun-Hao Lin (1):
      r8169:update rtl8168g pcie ephy parameter

Cyrille Pitchen (1):
      net/macb: add TX multiqueue support for gem

David S. Miller (5):
      Merge branch 'dsa'
      Merge branch 'dma_mb'
      Merge branch 'cxgb4'
      Merge branch 'kill_tasklet_hi_enable'
      Merge branch 'gem'

Florian Fainelli (3):
      net: dsa: handle non-existing PHYs on switch internal bus
      net: dsa: propagate error code from dsa_slave_phy_setup
      net: dsa: bcm_sf2: force link for all fixed PHY devices

Hariprasad Shenai (1):
      cxgb4: Add support for QSA modules

Karen Xie (7):
      cxgb4i: fix tx immediate data credit check
      cxgb4i: fix credit check for tx_data_wr
      cxgb4/cxgb4i: set the max. pdu length in firmware
      cxgb4i: additional types of negative advice
      cxgb4i: handle non-pdu-aligned rx data
      cxgb4i: use set_wr_txq() to set tx queues
      libcxgbi: fix freeing skb prematurely

Mark Salter (1):
      net: phy: export fixed_phy_register()

Quentin Lambert (2):
      jme: replace calls to redundant function
      linux/interrupt.h: remove the definition of unused tasklet_hi_enable

Tobias Klauser (2):
      net: ethernet: smsc: Allow to select SMC91X for nios2
      net: ethernet: davicom: Allow to select DM9000 for nios2

Toshiaki Makita (1):
      vlan: Add ability to always enable TSO/UFO

 Documentation/memory-barriers.txt               |  42 ++++++
 arch/alpha/include/asm/barrier.h                |  51 +++++++
 arch/arm/include/asm/barrier.h                  |   4 +
 arch/arm64/include/asm/barrier.h                |   3 +
 arch/blackfin/include/asm/barrier.h             |  51 +++++++
 arch/ia64/include/asm/barrier.h                 |  25 ++--
 arch/metag/include/asm/barrier.h                |  19 +--
 arch/mips/include/asm/barrier.h                 |  61 +-------
 arch/powerpc/include/asm/barrier.h              |  19 ++-
 arch/s390/include/asm/barrier.h                 |   7 +-
 arch/sparc/include/asm/barrier_64.h             |   7 +-
 arch/x86/include/asm/barrier.h                  |  70 ++-------
 arch/x86/um/asm/barrier.h                       |  20 +--
 drivers/net/dsa/bcm_sf2.c                       |  23 +--
 drivers/net/ethernet/cadence/macb.c             | 456 ++++++++++++++++++++++++++++++++++++--------------------
 drivers/net/ethernet/cadence/macb.h             |  36 ++++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |   2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  13 +-
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h   |   2 +
 drivers/net/ethernet/davicom/Kconfig            |   2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_main.c   |   6 +-
 drivers/net/ethernet/intel/igb/igb_main.c       |   6 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |   9 +-
 drivers/net/ethernet/jme.c                      |  12 +-
 drivers/net/ethernet/realtek/r8169.c            |  53 +++++--
 drivers/net/ethernet/smsc/Kconfig               |   4 +-
 drivers/net/phy/fixed.c                         |   1 +
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c              | 144 +++++++++++++-----
 drivers/scsi/cxgbi/libcxgbi.c                   |   4 +-
 drivers/scsi/cxgbi/libcxgbi.h                   |   4 +-
 include/asm-generic/barrier.h                   |   8 +
 include/linux/interrupt.h                       |   6 -
 net/8021q/vlan_dev.c                            |   7 +-
 net/dsa/slave.c                                 |  16 +-
 net/ipv4/fib_trie.c                             |   3 +-
 35 files changed, 771 insertions(+), 425 deletions(-)

^ permalink raw reply

* Re: [PATCH V1 net-next 1/2] pgtable: Add API to query if write combining is available
From: David Miller @ 2014-12-12 20:36 UTC (permalink / raw)
  To: gerlitz.or
  Cc: moshel, ogerlitz, jackm, talal, yevgenyp, netdev, amirv, moshel
In-Reply-To: <CAJ3xEMgufPhyeKHC_wWhw+khEhMAgG6yKuN8TLH0-xXH5Fq8Bw@mail.gmail.com>

From: Or Gerlitz <gerlitz.or@gmail.com>
Date: Thu, 27 Nov 2014 08:48:24 +0200

> On Sun, Oct 12, 2014 at 11:54 AM, Moshe Lazer <moshel@dev.mellanox.co.il> wrote:
>>
>> On 10/8/2014 7:24 PM, David Miller wrote:
>>>
>>> From: Moshe Lazer <moshel@dev.mellanox.co.il>
>>> Date: Wed, 08 Oct 2014 11:44:57 +0300
>>>
>>>>> #if defined(__i386__) || defined(__x86_64__)
>>>>>         if (map->type == _DRM_REGISTERS && !(map->flags &
>>>>> _DRM_WRITE_COMBINING))
>>>>>                 tmp = pgprot_noncached(tmp);
>>>>>         else
>>>>>                 tmp = pgprot_writecombine(tmp);
>>>>> #elif defined(__powerpc__)
>>>>>         pgprot_val(tmp) |= _PAGE_NO_CACHE;
>>>>>         if (map->type == _DRM_REGISTERS)
>>>>>                 pgprot_val(tmp) |= _PAGE_GUARDED;
>>>>> #elif defined(__ia64__)
>>>>>         if (efi_range_is_wc(vma->vm_start, vma->vm_end -
>>>>>                                     vma->vm_start))
>>>>>                 tmp = pgprot_writecombine(tmp);
>>>>>         else
>>>>>                 tmp = pgprot_noncached(tmp);
>>>>> #elif defined(__sparc__) || defined(__arm__) || defined(__mips__)
>>>>>         tmp = pgprot_noncached(tmp);
>>>>> #endif
>>>>
>>>> The idea was to provide an indication as for whether the arch supports
>>>> write-combining in general.
>>>> If we want to benefit from blue flame operations, we need to map the
>>>> blue flame registers as write combining - otherwise there is no
>>>> benefit. So we would like to know if write combining is supported by
>>>> the system or not.
>>>>
>>> You completely miss my point.  On a given architectuire it might be
>>> _illegal_ to map certain address ranges as write-combining without
>>> checks like the ones above that ia64 needs.
>>>
>>> Therefore your proposed interface is by definition insufficient.
>>
>> Thanks David, I'll try to clarify my point.
>> For me the writecombine_available() is a way to know if the
>> pgprot_writecombine() is effective or just cover call to the
>> pgprot_noncached().
>> I want to use the writecombine_available() regardless to the mapping
>> address.
>> For example in mlx4 query_device I want to indicate that blue-flame is not
>> supported if  `writecombine_available() ==  false`.
>> In this case we don't have the mapping address yet.
 ...
> Pinging you... could you respond on Moshe's email which hopefully
> addresses your comments?

As I stated, some platforms have restrictions on certain address ranges
when it comes to supporting write combining.

Therefore, it is a _requirement_, not optional, that we have an
address available so that the platform specific code can give an
accurate response to the question "is write combining available"
because the answer to that question is address dependent.

^ permalink raw reply

* Re: [WTF?] random test in netlink_sendmsg()
From: David Miller @ 2014-12-12 20:34 UTC (permalink / raw)
  To: viro; +Cc: kaber, netdev
In-Reply-To: <20141128062315.GC29748@ZenIV.linux.org.uk>

From: Al Viro <viro@ZenIV.linux.org.uk>
Date: Fri, 28 Nov 2014 06:23:15 +0000

> 	In netlink_sendmsg() we have the following:
> 
>         if (netlink_tx_is_mmaped(sk) &&
>             msg->msg_iov->iov_base == NULL) {
>                 err = netlink_mmap_sendmsg(sk, msg, dst_portid, dst_group,
>                                            siocb);
>                 goto out;
>         }
> 
> Now, suppose sendmsg(2) is called with msg.msg_iovlen == 0.  We'll have
> ->msg_iov in kernel-side copy pointing at the uninitialized array in
> stack frame of ___sys_sendmsg() - neither new nor old code touches elements
> past the first msg_iovlen ones.  So in that case it checks if an
> uninitialized word on stack is zero.
> 
> 	What is that check trying to do?  Is that simply missing
> "(msg->msg_iovlen > 0) &&"?  And why on the Earth didn't it simply use
> zero msg_iovlen as the indicator, instead of messing with iovec contents?
> Obviously too late to change, but... ouch.

I think it's simply trying to say: if nothing in the given iovec, use
the mmap() netlink area for the data.

I cannot vouch for the correctness of this test.

If we take the netlink_mmap_sendmsg() path, msg->msg_iov is not
accessed at all, so it cannot be a huge problem.

^ permalink raw reply

* Re: [bisected] xfrm: TCP connection initiating PMTU discovery stalls on v3.
From: Wolfgang Walter @ 2014-12-12 20:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Jarosch, netdev, Eric Dumazet, Herbert Xu,
	Steffen Klassert
In-Reply-To: <1418405225.13491.19.camel@edumazet-glaptop2.roam.corp.google.com>

Am Freitag, 12. Dezember 2014, 09:27:05 schrieb Eric Dumazet:
> On Fri, 2014-12-12 at 17:58 +0100, Wolfgang Walter wrote:
> > This fixes hangs of local tcp connections over ipsec tunnels where
> > pmtu is lower than the mtu of the interface.
> 
> Again this is a work around, the one liner patch is not a clean patch.
> 
> Something is wrong, we should fix GSO path instead.
> 
> Why ? Because we can queue a GSO packet in a qdisc, then later call
> sk_setup_caps() (too late)

Hmm. Do you think that sk_setup_caps should not disable GSO later at all? Or 
only that it should not influence already queued packets?

In the case of an ipsec tunnel GSO gets disabled later (for tcp connections) 
because dst->header_len() is zero at first and non-zero later. tcp connections 
initiated not to long afterwards start with GSO disabled from the beginning.


dst->header_len() being non zero for ipsec tunnels seems to be the case with 
or without an adjustment via PMTU. It seems that routing only does not know it 
from the beginning on (for incomimg connections) :-).

Why does a non zero dst->header_len() disable GSO? Is it just an optimization 
or does GSO not work in that case? If dst->header_len() being non zero signals 
that ipsec tunnel mode can't handle GSO at all then the problem would not lay 
in the gso path.


There is one thing which indicates that there is also a problen in the GSO 
path, though. As my tests demonstrated disabling GSO on the physical interface 
serving the ipsec tunnel does not prevent the hangs. It does, though, if sk-
>sk_gso_max_segs = 1 in sk_setup_caps() if sk_can_gso(sk) returns false. This 
was the first variation of the patch where sk->sk_gso_max_segs was not set if 
dst->header_len() is non zero.


Concering qdisc: an ipsec tunnel doesn't have a "tunnel-device" so it does not 
have qdisc? The ipsec tunnel transforms packets and the results are (in case 
of ipsec esp tunnel) esp packets which cannot be resgmented. Only these 
transformed packets are finally queued to a qdisc of a network device then.


Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

^ permalink raw reply

* Re: tg3 broken in 3.18.0?
From: Nils Holland @ 2014-12-12 20:31 UTC (permalink / raw)
  To: Jonathan Bither; +Cc: netdev
In-Reply-To: <548B00CD.3030708@gmail.com>

On Fri, Dec 12, 2014 at 09:50:53AM -0500, Jonathan Bither wrote:
> Not sure if it helps any, but tg3 works here after a 3.18 upgrade. I'd 
> be happy to share any information if it would help you out.

What I get here is this (output captured under 3.17.3):

triton513 ~ # ethtool -i enp2s0
driver: tg3
version: 3.137
firmware-version: sb
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

So, you, Marcelo and me, we all seem to have different firmware
versions. If I'm correct, different versions of tg3 exist that either
contain the firmware onboard or get it injected at driver
initialization (correct me if I'm wrong). If Marcelo's and my fw
version had been the same this might have given a clue, but nope.

I'm putting my faith in Marcelo bisecting and finding out more details.
I might try that as well over the weekend, at least to the extent
possible without ever being able to have live access to the machine
when it is running a kernel exhibiting this issue.

Greetings,
Nils

^ permalink raw reply

* Re: [PATCH v3 0/1] net/macb: add TX multiqueue support for gem
From: David Miller @ 2014-12-12 20:18 UTC (permalink / raw)
  To: cyrille.pitchen
  Cc: nicolas.ferre, linux-arm-kernel, netdev, soren.brinkmann,
	linux-kernel
In-Reply-To: <cover.1418383702.git.cyrille.pitchen@atmel.com>

From: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Date: Fri, 12 Dec 2014 13:26:43 +0100

> At the first look this patch may look quite big but it cannot be splitted.
> Each queue has its own dedicated IRQ, which should be handled.
> Also the Transmit Base Queue Pointer register of each available queue must be
> initialized before starting the transmission, otherwise the transmission will be
> halted immediately as HRESP errors are likely to occur.
> In addition, some fields had to be moved from struct macb into struct macb_queue
> so a common code could manage the queues.
> 
> This patch was applied to net-next and tested on a sama5d36ek board, which embeds
> both macb and gem IPs, to check the backward compatibility.
> 
> Also it was tested on a sama5dx FPGA platform with a gem designed to use 3 queues.
> Then we used the tc program to set a queue discipline policy as describe in the
> Documentation/networking/multiqueue.txt: we successfully used each queue.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 0/2] Remove redundant function
From: David Miller @ 2014-12-12 20:16 UTC (permalink / raw)
  To: lambert.quentin; +Cc: cooldavid, netdev, linux-kernel
In-Reply-To: <20141212123432.GA12765@sloth>

From: Quentin Lambert <lambert.quentin@gmail.com>
Date: Fri, 12 Dec 2014 13:34:32 +0100

> tasklet_hi_enable and tasklet_enable are redundant. Since
> tasklet_hi_enable is used only 6 times in 1 file, the first
> patch changes calls to the function with calls to tasklet_enable.
> 
> The second patch removes tasklet_hi_enable definition.

This is pretty simple and straightforward, so I hope nobody minds
my merging this via my networking tree.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH v2 0/6] net-PPP: Deletion of a few unnecessary checks
From: David Miller @ 2014-12-12 20:07 UTC (permalink / raw)
  To: elfring
  Cc: sergei.shtylyov, paulus, linux-ppp, netdev, eric.dumazet,
	linux-kernel, kernel-janitors, julia.lawall
In-Reply-To: <548B2468.5050402@users.sourceforge.net>

From: SF Markus Elfring <elfring@users.sourceforge.net>
Date: Fri, 12 Dec 2014 18:22:48 +0100

>>> Where should "the error pointers" be stored instead?
>> A local variable, before you assign it into the datastructure.
> 
> Will it be acceptable for you that anyone (or even me) will introduce
> such a change with a seventh (and eventually eighth) update step here?
> Do you want any other sequence for source code preparation of
> the requested software improvement?

I'd like to honestly ask why you are being so difficult?

Everyone gets their code reviewed, everyone has to modify their
changes to adhere to the subsystem maintainer's wishes.  You
are not being treated specially, and quite frankly nobody is
asking anything unreasonable of you.

^ permalink raw reply

* Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
From: John Fastabend @ 2014-12-12 20:05 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hubert Sokolowski, Roopa Prabhu, netdev@vger.kernel.org,
	Vlad Yasevich
In-Reply-To: <548AFD41.3010801@mojatatu.com>

On 12/12/2014 06:35 AM, Jamal Hadi Salim wrote:
> On 12/12/14 08:36, Hubert Sokolowski wrote:
>>> On 12/12/14 06:38, Hubert Sokolowski wrote:
>>>>> On 12/11/14, 9:06 AM, Hubert Sokolowski wrote:
>>>
>>>> Please see how the ndo_dflt_fdb_add and del are called:
>>>>         if (dev->netdev_ops->ndo_fdb_add)
>>>>             err = dev->netdev_ops->ndo_fdb_add(ndm, tb, dev, addr,
>>>>                                vid,
>>>>                                nlh->nlmsg_flags);
>>>>         else
>>>>             err = ndo_dflt_fdb_add(ndm, tb, dev, addr, vid,
>>>>                            nlh->nlmsg_flags);
>>>>
>>>
>>> Semantically add and dump are not the same.
>>> Add adds a specific entry.
>>> Dump not only dumps the fdb table but also the unicast and multicast
>>> addresses.
>> this is not true for 3.16 and before. Please see:
>> http://lxr.free-electrons.com/source/net/core/rtnetlink.c?v=3.16#L2545
>> It lets you fully manage the FDB table by overwriding fdb_add, del and
>> dump
>> in the same way.
>>
>  >
>>
>>>
>>>
>>>> As it was suggested by Ronen I can modify the patch so the dflt
>>>> callback
>>>> is always called for bridge devices if this is desirable. Either by
>>>> calling
>>>> it when following expression is true:
>>>>       (dev->priv_flags & IFF_BRIDGE_PORT)
>>>> or by modifying br_fdb_dump to call ndo_dflt_fdb_dump.
>>>>
>>>
>>> Are you saying the above is going to work? You need to TEST please.
>> yes, it works and it is not a rocket science :). we just need to agree
>> on the approach to use.
>>
>
> I am happy if this solves wont break
> any of the use cases Vlad made me test and make sure work.
> When i said "test" - I mean run the testcases outlined in the
> commit log; if those work i dont see what the issue is.
>

I'll wake up ;)

First quick grep of code finds some strange uses of ndo_fdb_dump like
this in macvlan,

   ./drivers/net/macvlan.c
         .ndo_fdb_dump           = ndo_dflt_fdb_dump,

I'll be sending a patch once net-next opens up again to resolve it. Its
harmless though so not really a fix for net.

There seem to be a few places that have the potential to return
different values then the uc/mc lists.

	./drivers/net/vxlan.c
	./drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
	./drivers/net/ethernet/rocker/rocker.c

	./net/bridge/br_device.c

So I guess we can walk through the list and analyse them a bit.

vxlan:

Try stacking devices on top of the vxlan device this will call a uc_add
routine if you then change the mac addr on the vlan. This would get
reported by the dflt fdb dump handlers but not the drivers fdb dump
handlers. So removing the dflt dump handler from this patch at least
changes things. We should either explain why this is OK or accept that
the driver needs to be fixed. Or I guess that the patch is just wrong.
My guess is one of the latter options.

Also Jamal, your original patch seems like it might of changed this
and Hubert's patch is reverting back to its original case. Was this
specific part of your patch intentional?

qlcnic:

hmm again this is changed a bit. In the case of !learning and no
eswitch or sriov (side-bar question how do you do SR-IOV without
some sort of embedded switch?) then we return idx? See qlcnic_fdb_dump.

So at least this is different after the patch as well. Same questions
as above, explain why this is OK, fix it, or solve the issue some other
way. Although again it was this way before Jamal's patch.

rocker.c:

rocker never calls dflt dump either. Should it or is it not necessary?

br_device:

same story.

Hubert, can you run the set of commands in Jamal's patch and repost
your patch for us with them in the commit msg and call out any
differences?

Note the above is just from reading the code I never really ran any
tests to sort it out completely.

Thanks!
John



> cheers,
> jamal
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [PATCH iproute2 v3] ip: Simplify executing ip cmd within network ns
From: vadim4j @ 2014-12-12 19:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Vadim Kochan, netdev
In-Reply-To: <20141212060436.GA876@angus-think.lan>

On Fri, Dec 12, 2014 at 08:04:36AM +0200, vadim4j@gmail.com wrote:
> On Thu, Dec 11, 2014 at 05:26:40PM -0800, Stephen Hemminger wrote:
> > On Fri, 12 Dec 2014 00:32:51 +0200
> > Vadim Kochan <vadim4j@gmail.com> wrote:
> > 
> > > +
> > > +#define NETNS_RUN_DIR "/var/run/netns"
> > > +#define NETNS_ETC_DIR "/etc/netns"
> > > +
> > > +#ifndef CLONE_NEWNET
> > > +#define CLONE_NEWNET 0x40000000	/* New network namespace (lo, device, names sockets, etc) */
> > > +#endif
> > > +
> > > +#ifndef MNT_DETACH
> > > +#define MNT_DETACH	0x00000002	/* Just detach from the tree */
> > > +#endif /* MNT_DETACH */
> > > +
> > > +/* sys/mount.h may be out too old to have these */
> > > +#ifndef MS_REC
> > > +#define MS_REC		16384
> > > +#endif
> > > +
> > > +#ifndef MS_SLAVE
> > > +#define MS_SLAVE	(1 << 19)
> > > +#endif
> > > +
> > > +#ifndef MS_SHARED
> > > +#define MS_SHARED	(1 << 20)
> > > +#endif
> > > +
> > > +extern int netns_switch(char *netns);
> > > +
> > 
> > Maybe it would be cleaner to introduce a new file netns.h
> > with this and the setnetns syscall wrapper
> > 
> And MS_*/MNT_* defines to new mount.h ?
> 
> Thanks,
> 

I think it can go all together (netns + mount things) to the new
include/namespace.h header, what do you think ?

And what about netns_switch func - let it be in utils.c/utils.h or
add new lib/namespace.c which will be linked to the libutil ?

Thanks,

^ permalink raw reply

* Re: am335x: cpsw: interrupt failure
From: Yegor Yefremov @ 2014-12-12 19:19 UTC (permalink / raw)
  To: Felipe Balbi; +Cc: netdev, N, Mugunthan V
In-Reply-To: <20141212173210.GI7549@saruman>

On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi <balbi@ti.com> wrote:
> Hi,
>
> On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
>> U-Boot version: 2014.07
>> Kernel config is omap2plus with enabled USB
>>
>> # cat /proc/version
>> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
>> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
>> Mon Dec 8 22:47:43 CET 2014
>
> Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
> blacklisted. Can you try with 4.9.x just to make sure ?

Will do.

> I'll try to run your test case here once i'm back from vacations.

Wish you good vacation!

Yegor

^ permalink raw reply

* Re: [PATCH v2 0/6] net-PPP: Deletion of a few unnecessary checks
From: Eric Dumazet @ 2014-12-12 19:08 UTC (permalink / raw)
  To: SF Markus Elfring
  Cc: David Miller, Sergei Shtylyov, Paul Mackerras, linux-ppp, netdev,
	linux-kernel, kernel-janitors, Julia Lawall
In-Reply-To: <548B2468.5050402@users.sourceforge.net>

On Fri, 2014-12-12 at 18:22 +0100, SF Markus Elfring wrote:

> Will it be acceptable for you that anyone (or even me) will introduce
> such a change with a seventh (and eventually eighth) update step here?
> Do you want any other sequence for source code preparation of
> the requested software improvement?

The thing is : We are in the merge window, tracking bugs added in latest
dev cycle.

Having to deal with patches like yours is adding pressure
on the maintainer (and other developers) at the wrong time.





^ permalink raw reply

* Re: [PATCH v2 0/6] net-PPP: Deletion of a few unnecessary checks
From: Julia Lawall @ 2014-12-12 18:46 UTC (permalink / raw)
  To: SF Markus Elfring
  Cc: David Miller, Sergei Shtylyov, Paul Mackerras, linux-ppp, netdev,
	Eric Dumazet, linux-kernel, kernel-janitors, Julia Lawall
In-Reply-To: <548B0A29.8050503@users.sourceforge.net>

> I hope that a bit more constructive suggestions will be contributed by
> involved
> software developers around the affected source code. Now it seems
> that a small code clean-up becomes a more challenging development task.

This is often the case.  Doing something half way is not useful.

julia

^ permalink raw reply

* About ARP state change from Delay to Stale
From: Sriram Raghunathan @ 2014-12-12 17:37 UTC (permalink / raw)
  To: netdev, linux-kernel

Hi,

    I'd a question about ARP entry state change from DELAY to STALE. I'm
    referring to 3.10 Master kernel, where this particular state change 
is not notified on neigh_update. In  case, on creating a patch for the 
same, would it affect the kernel behavior ?

    Could you please share as why this state change is not notified 
right now?

Sriram

^ permalink raw reply

* Re: am335x: cpsw: interrupt failure
From: Felipe Balbi @ 2014-12-12 17:32 UTC (permalink / raw)
  To: Yegor Yefremov; +Cc: Felipe Balbi, netdev, N, Mugunthan V
In-Reply-To: <CAGm1_kuaqXrU4zhNUZeY85+OeXHApSUDjJdsXcyBdvHTNhLCmQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 556 bytes --]

Hi,

On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
> U-Boot version: 2014.07
> Kernel config is omap2plus with enabled USB
> 
> # cat /proc/version
> Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
> 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
> Mon Dec 8 22:47:43 CET 2014

Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
blacklisted. Can you try with 4.9.x just to make sure ?

I'll try to run your test case here once i'm back from vacations.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [bisected] xfrm: TCP connection initiating PMTU discovery stalls on v3.
From: Eric Dumazet @ 2014-12-12 17:27 UTC (permalink / raw)
  To: Wolfgang Walter
  Cc: Thomas Jarosch, netdev, Eric Dumazet, Herbert Xu,
	Steffen Klassert
In-Reply-To: <2630078.LdJXSsPB40@h2o.as.studentenwerk.mhn.de>

On Fri, 2014-12-12 at 17:58 +0100, Wolfgang Walter wrote:

> This fixes hangs of local tcp connections over ipsec tunnels where
> pmtu is lower than the mtu of the interface.

Again this is a work around, the one liner patch is not a clean patch.

Something is wrong, we should fix GSO path instead.

Why ? Because we can queue a GSO packet in a qdisc, then later call
sk_setup_caps() (too late)

^ permalink raw reply

* Re: [PATCH v2 0/6] net-PPP: Deletion of a few unnecessary checks
From: SF Markus Elfring @ 2014-12-12 17:22 UTC (permalink / raw)
  To: David Miller
  Cc: Sergei Shtylyov, Paul Mackerras, linux-ppp, netdev, Eric Dumazet,
	linux-kernel, kernel-janitors, Julia Lawall
In-Reply-To: <20141212.115922.687789059853236747.davem@davemloft.net>

>> Where should "the error pointers" be stored instead?
> A local variable, before you assign it into the datastructure.

Will it be acceptable for you that anyone (or even me) will introduce
such a change with a seventh (and eventually eighth) update step here?
Do you want any other sequence for source code preparation of
the requested software improvement?

Regards,
Markus

^ permalink raw reply

* RE: [net PATCH] fib_trie: Fix trie balancing issue if new node pushes down existing node
From: David Laight @ 2014-12-12 17:05 UTC (permalink / raw)
  To: 'Alexander Duyck', David Miller,
	alexander.h.duyck@redhat.com
  Cc: netdev@vger.kernel.org
In-Reply-To: <548B132B.5000606@gmail.com>

From: Alexander Duyck
...
> >> One has to be mindful with this code that what it's doing now might
> >> be intentional.  For example, it might be doing things this way
> >> on purpose in order to minimize rebalancing during route flaps.
> >>
> >> Barring anything like that, I think your change is fine.
> > Alex, in case it isn't clear, I'm hoping that you might have some
> > thoughts on this aspect of your changes. :)
> 
> Route flaps should at most trigger an inflate without a deflate due to
> the way the resize logic works.  If this occurs often it would be useful
> since then there would already be space in the tree for the next time
> around.

One way to deal with 'flapping' is to leave deleted items in the tree
(marked so they won't be used).
Deleted entries do need to be garbage collected at some point.
Doing so during insert is more than adequate, and possibly only for
nodes that are traversed while processing the insert itself
(and maybe only is the ratio of valid to invalid entries is low).

This might also mean that you don't need to actively run any timeouts.
Timed out entries just stay put - marked invalid by their timestamp.

	David

^ permalink raw reply

* Re: [PATCH v2 0/6] net-PPP: Deletion of a few unnecessary checks
From: David Miller @ 2014-12-12 16:59 UTC (permalink / raw)
  To: elfring
  Cc: sergei.shtylyov, paulus, linux-ppp, netdev, eric.dumazet,
	linux-kernel, kernel-janitors, julia.lawall
In-Reply-To: <548B1E44.6050005@users.sourceforge.net>

From: SF Markus Elfring <elfring@users.sourceforge.net>
Date: Fri, 12 Dec 2014 17:56:36 +0100

> Where should "the error pointers" be stored instead?

A local variable, before you assign it into the datastructure.

^ permalink raw reply

* Re: [PATCH net v2] net: smc91x: Fix build without gpiolib
From: David Miller @ 2014-12-12 16:58 UTC (permalink / raw)
  To: tklauser; +Cc: tony, nico, netdev
In-Reply-To: <20141212164528.GK16916@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Fri, 12 Dec 2014 17:45:29 +0100

> On 2014-12-12 at 17:30:20 +0100, David Miller <davem@davemloft.net> wrote:
>> In my opinion, if the code blocks enabling the configurations that
>> need this are enabled, so should GPIO be depended upon.
>> 
>> I think, at a minimum, when CONFIG_OF is enabled smsc91x should
>> require GPIO.
> 
> Agreed. This is the better solution, causing less surprises for the
> user. Should this be a "select GPIOLIB if OF" then?

If GPIO is a child node in the Kconfig hierarchy (ie. has no
dependencies of it's own), then yes.  Otherwise, a depends
will need to be used, because select does not recursively
trigger a select'd nodes dependencies.

^ permalink raw reply

* Re: [bisected] xfrm: TCP connection initiating PMTU discovery stalls on v3.
From: Wolfgang Walter @ 2014-12-12 16:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Jarosch, netdev, Eric Dumazet, Herbert Xu,
	Steffen Klassert
In-Reply-To: <1418238603.27198.37.camel@edumazet-glaptop2.roam.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 3614 bytes --]

Hello Eric,

my plan to use e114a710aa5058c0ba4aa1dfb105132aefeb5e04 to make it easier to apply tcp-refine-TSO-autosizing was a bad idea as that needs further patches :-). Backporting

tcp-refine-TSO-autosizing
https://patchwork.ozlabs.org/patch/418506/

seems to be the easier solution.

I tested if 3.18 and 3.17.6 are affected and they are. I applied tcp-refine-TSO-autosizing and the one-liner below and then both work fine.

Attached is

1) my backport of tcp-refine-TSO-autosizing for 3.14.26
2) my backport of tcp-refine-TSO-autosizing for 3.17.6

For 3.18 tcp-refine-TSO-autosizing just applies fine.

I tested 3.16.26 + patches, 3.17.6 + patches and 3.18.2 + patches.

And here again the one-liner:

====================
Subject: tcp: ensure that sk_gso_max_segs = 1 if GSO is disabled

sk_setup_caps() may disable GSO even if the network device
supports it. This may happens after the creation time of the
socket (e.g. triggered by xfrm, PMTU or routing changes).

This fixes hangs of local tcp connections over ipsec tunnels where
pmtu is lower than the mtu of the interface.
---
 net/core/sock.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/sock.c b/net/core/sock.c
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1559,6 +1559,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 	if (sk->sk_route_caps & NETIF_F_GSO)
 		sk->sk_route_caps |= NETIF_F_GSO_SOFTWARE;
 	sk->sk_route_caps &= ~sk->sk_route_nocaps;
+	sk->sk_gso_max_segs = 1;
 	if (sk_can_gso(sk)) {
 		if (dst->header_len) {
 			sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
-- 
1.7.10.4
=======================

Do you think the backports are ok? Then Thomas my test them, too.

What should be done now?

First the one line should go into 3.19 as soon as tcp-refine-TSO-autosizing
arrived there?

If both are in Torvald's who should I ask for inclusion into stable?

Am Mittwoch, 10. Dezember 2014, 11:10:03 schrieb Eric Dumazet:
> On Wed, 2014-12-10 at 19:34 +0100, Wolfgang Walter wrote:
> > So gso is on. When the hang happens sk_setup_caps is called from
> > inet_sk_rebuild_header(). Now the path
> > 
> > 		sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
> > 
> > is taken as dst->header_len is now non zero.
> > 
> > This is the reason why later calls of sk_can_gso() return false.
> > 
> > I'll try to change the above patch to
> > 
> > @@ -1585,6 +1585,8 @@ void sk_setup_caps(struct sock *sk, struct dst_entry
> > *dst) sk->sk_gso_max_size = dst->dev->gso_max_size;
> > 
> >  			sk->sk_gso_max_segs = dst->dev->gso_max_segs;
> >  		
> >  		}
> > 	
> > 	}
> > 
> > +   if (sk_can_gso(sk)) {
> > +		sk->sk_gso_max_segs = 1;
> > 
> >  	}
> >  
> >  }
> >  EXPORT_SYMBOL_GPL(sk_setup_caps);
> > 
> > so that the case that GSO is disabled because of dst->header_len != 0 sets
> > sk_gso_max_segs, too.
> 
> Sounds good, or maybe simply :
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index
> 9a56b2000c3f374fb95aedada3327447816a9512..edca31319dfee57cabe5b376505324bea
> 07f767a 100644 --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1577,6 +1577,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry
> *dst) if (sk->sk_route_caps & NETIF_F_GSO)
>  		sk->sk_route_caps |= NETIF_F_GSO_SOFTWARE;
>  	sk->sk_route_caps &= ~sk->sk_route_nocaps;
> +	sk->sk_gso_max_segs = 1;
>  	if (sk_can_gso(sk)) {
>  		if (dst->header_len) {
>  			sk->sk_route_caps &= ~NETIF_F_GSO_MASK;

Regards and thank you very much for your help,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

[-- Attachment #2: 0002-tcp-refine-TSO-autosizing.patch --]
[-- Type: text/x-patch, Size: 7394 bytes --]

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -134,7 +134,7 @@ struct tcp_sock {
 	/* inet_connection_sock has to be the first member of tcp_sock */
 	struct inet_connection_sock	inet_conn;
 	u16	tcp_header_len;	/* Bytes of tcp header to send		*/
-	u16	xmit_size_goal_segs; /* Goal for segmenting output packets */
+	u16	gso_segs;	/* Max number of segs per GSO packet	*/
 
 /*
  *	Header prediction flags
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -825,47 +825,29 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
 				       int large_allowed)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	u32 xmit_size_goal, old_size_goal;
-
-	xmit_size_goal = mss_now;
-
-	if (large_allowed && sk_can_gso(sk)) {
-		u32 gso_size, hlen;
-
-		/* Maybe we should/could use sk->sk_prot->max_header here ? */
-		hlen = inet_csk(sk)->icsk_af_ops->net_header_len +
-		       inet_csk(sk)->icsk_ext_hdr_len +
-		       tp->tcp_header_len;
-
-		/* Goal is to send at least one packet per ms,
-		 * not one big TSO packet every 100 ms.
-		 * This preserves ACK clocking and is consistent
-		 * with tcp_tso_should_defer() heuristic.
-		 */
-		gso_size = sk->sk_pacing_rate / (2 * MSEC_PER_SEC);
-		gso_size = max_t(u32, gso_size,
-				 sysctl_tcp_min_tso_segs * mss_now);
-
-		xmit_size_goal = min_t(u32, gso_size,
-				       sk->sk_gso_max_size - 1 - hlen);
-
-		xmit_size_goal = tcp_bound_to_half_wnd(tp, xmit_size_goal);
-
-		/* We try hard to avoid divides here */
-		old_size_goal = tp->xmit_size_goal_segs * mss_now;
-
-		if (likely(old_size_goal <= xmit_size_goal &&
-			   old_size_goal + mss_now > xmit_size_goal)) {
-			xmit_size_goal = old_size_goal;
-		} else {
-			tp->xmit_size_goal_segs =
-				min_t(u16, xmit_size_goal / mss_now,
-				      sk->sk_gso_max_segs);
-			xmit_size_goal = tp->xmit_size_goal_segs * mss_now;
-		}
+	u32 new_size_goal, size_goal, hlen;
+
+	if (!large_allowed || !sk_can_gso(sk))
+		return mss_now;
+
+	/* Maybe we should/could use sk->sk_prot->max_header here ? */
+	hlen = inet_csk(sk)->icsk_af_ops->net_header_len +
+	       inet_csk(sk)->icsk_ext_hdr_len +
+	       tp->tcp_header_len;
+
+	new_size_goal = sk->sk_gso_max_size - 1 - hlen;
+	new_size_goal = tcp_bound_to_half_wnd(tp, new_size_goal);
+
+	/* We try hard to avoid divides here */
+	size_goal = tp->gso_segs * mss_now;
+	if (unlikely(new_size_goal < size_goal ||
+		     new_size_goal >= size_goal + mss_now)) {
+		tp->gso_segs = min_t(u16, new_size_goal / mss_now,
+				     sk->sk_gso_max_segs);
+		size_goal = tp->gso_segs * mss_now;
 	}
 
-	return max(xmit_size_goal, mss_now);
+	return max(size_goal, mss_now);
 }
 
 static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -290,7 +290,7 @@ bool tcp_is_cwnd_limited(const struct sock *sk, u32 in_flight)
 	left = tp->snd_cwnd - in_flight;
 	if (sk_can_gso(sk) &&
 	    left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd &&
-	    left < tp->xmit_size_goal_segs)
+	    left < min_t(u32, tp->gso_segs, sk->sk_gso_max_segs))
 		return true;
 	return left <= tcp_max_tso_deferred_mss(tp);
 }
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1432,6 +1432,27 @@ static bool tcp_nagle_check(bool partial, const struct tcp_sock *tp,
 		((nonagle & TCP_NAGLE_CORK) ||
 		 (!nonagle && tp->packets_out && tcp_minshall_check(tp)));
 }
+
+/* Return how many segs we'd like on a TSO packet,
+ * to send one TSO packet per ms
+ */
+static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now)
+{
+	u32 bytes, segs;
+
+	bytes = min(sk->sk_pacing_rate >> 10,
+		    sk->sk_gso_max_size - 1 - MAX_TCP_HEADER);
+
+	/* Goal is to send at least one packet per ms,
+	 * not one big TSO packet every 100 ms.
+	 * This preserves ACK clocking and is consistent
+	 * with tcp_tso_should_defer() heuristic.
+	 */
+	segs = max_t(u32, bytes / mss_now, sysctl_tcp_min_tso_segs);
+
+	return min_t(u32, segs, sk->sk_gso_max_segs);
+}
+
 /* Returns the portion of skb which can be sent right away */
 static unsigned int tcp_mss_split_point(const struct sock *sk,
 					const struct sk_buff *skb,
@@ -1633,7 +1654,7 @@ static int tso_fragment(struct sock *sk, struct sk_buff *skb, unsigned int len,
  *
  * This algorithm is from John Heffner.
  */
-static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb)
+static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb, u32 max_segs)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -1663,8 +1684,7 @@ static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb)
 	limit = min(send_win, cong_win);
 
 	/* If a full-sized TSO skb can be sent, do it. */
-	if (limit >= min_t(unsigned int, sk->sk_gso_max_size,
-			   tp->xmit_size_goal_segs * tp->mss_cache))
+	if (limit >= max_segs * tp->mss_cache)
 		goto send_now;
 
 	/* Middle in queue won't get any more data, full sendable already? */
@@ -1857,6 +1877,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 	unsigned int tso_segs, sent_pkts;
 	int cwnd_quota;
 	int result;
+	u32 max_segs;
 
 	sent_pkts = 0;
 
@@ -1870,6 +1891,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 		}
 	}
 
+	max_segs = tcp_tso_autosize(sk, mss_now);
 	while ((skb = tcp_send_head(sk))) {
 		unsigned int limit;
 
@@ -1900,10 +1922,22 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 						      nonagle : TCP_NAGLE_PUSH))))
 				break;
 		} else {
-			if (!push_one && tcp_tso_should_defer(sk, skb))
+			if (!push_one && tcp_tso_should_defer(sk, skb, max_segs))
 				break;
 		}
 
+		limit = mss_now;
+		if (tso_segs > 1 && !tcp_urg_mode(tp))
+			limit = tcp_mss_split_point(sk, skb, mss_now,
+						    min_t(unsigned int,
+							  cwnd_quota,
+							  max_segs),
+						    nonagle);
+
+		if (skb->len > limit &&
+		    unlikely(tso_fragment(sk, skb, limit, mss_now, gfp)))
+			break;
+
 		/* TCP Small Queues :
 		 * Control number of packets in qdisc/devices to two packets / or ~1 ms.
 		 * This allows for :
@@ -1914,8 +1948,8 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 		 * of queued bytes to ensure line rate.
 		 * One example is wifi aggregation (802.11 AMPDU)
 		 */
-		limit = max_t(unsigned int, sysctl_tcp_limit_output_bytes,
-			      sk->sk_pacing_rate >> 10);
+		limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
+		limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes);
 
 		if (atomic_read(&sk->sk_wmem_alloc) > limit) {
 			set_bit(TSQ_THROTTLED, &tp->tsq_flags);
@@ -1930,18 +1964,6 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 				break;
 		}
 
-		limit = mss_now;
-		if (tso_segs > 1 && !tcp_urg_mode(tp))
-			limit = tcp_mss_split_point(sk, skb, mss_now,
-						    min_t(unsigned int,
-							  cwnd_quota,
-							  sk->sk_gso_max_segs),
-						    nonagle);
-
-		if (skb->len > limit &&
-		    unlikely(tso_fragment(sk, skb, limit, mss_now, gfp)))
-			break;
-
 		TCP_SKB_CB(skb)->when = tcp_time_stamp;
 
 		if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
-- 
1.7.10.4


[-- Attachment #3: 0001-tcp-refine-TSO-autosizing.patch --]
[-- Type: text/x-patch, Size: 6898 bytes --]

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -131,7 +131,7 @@ struct tcp_sock {
 	/* inet_connection_sock has to be the first member of tcp_sock */
 	struct inet_connection_sock	inet_conn;
 	u16	tcp_header_len;	/* Bytes of tcp header to send		*/
-	u16	xmit_size_goal_segs; /* Goal for segmenting output packets */
+	u16	gso_segs;	/* Max number of segs per GSO packet	*/
 
 /*
  *	Header prediction flags
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -836,47 +836,29 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
 				       int large_allowed)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	u32 xmit_size_goal, old_size_goal;
-
-	xmit_size_goal = mss_now;
-
-	if (large_allowed && sk_can_gso(sk)) {
-		u32 gso_size, hlen;
-
-		/* Maybe we should/could use sk->sk_prot->max_header here ? */
-		hlen = inet_csk(sk)->icsk_af_ops->net_header_len +
-		       inet_csk(sk)->icsk_ext_hdr_len +
-		       tp->tcp_header_len;
-
-		/* Goal is to send at least one packet per ms,
-		 * not one big TSO packet every 100 ms.
-		 * This preserves ACK clocking and is consistent
-		 * with tcp_tso_should_defer() heuristic.
-		 */
-		gso_size = sk->sk_pacing_rate / (2 * MSEC_PER_SEC);
-		gso_size = max_t(u32, gso_size,
-				 sysctl_tcp_min_tso_segs * mss_now);
-
-		xmit_size_goal = min_t(u32, gso_size,
-				       sk->sk_gso_max_size - 1 - hlen);
-
-		xmit_size_goal = tcp_bound_to_half_wnd(tp, xmit_size_goal);
-
-		/* We try hard to avoid divides here */
-		old_size_goal = tp->xmit_size_goal_segs * mss_now;
-
-		if (likely(old_size_goal <= xmit_size_goal &&
-			   old_size_goal + mss_now > xmit_size_goal)) {
-			xmit_size_goal = old_size_goal;
-		} else {
-			tp->xmit_size_goal_segs =
-				min_t(u16, xmit_size_goal / mss_now,
-				      sk->sk_gso_max_segs);
-			xmit_size_goal = tp->xmit_size_goal_segs * mss_now;
-		}
+	u32 new_size_goal, size_goal, hlen;
+
+	if (!large_allowed || !sk_can_gso(sk))
+		return mss_now;
+
+	/* Maybe we should/could use sk->sk_prot->max_header here ? */
+	hlen = inet_csk(sk)->icsk_af_ops->net_header_len +
+	       inet_csk(sk)->icsk_ext_hdr_len +
+	       tp->tcp_header_len;
+
+	new_size_goal = sk->sk_gso_max_size - 1 - hlen;
+	new_size_goal = tcp_bound_to_half_wnd(tp, new_size_goal);
+
+	/* We try hard to avoid divides here */
+	size_goal = tp->gso_segs * mss_now;
+	if (unlikely(new_size_goal < size_goal ||
+		     new_size_goal >= size_goal + mss_now)) {
+		tp->gso_segs = min_t(u16, new_size_goal / mss_now,
+				     sk->sk_gso_max_segs);
+		size_goal = tp->gso_segs * mss_now;
 	}
 
-	return max(xmit_size_goal, mss_now);
+	return max(size_goal, mss_now);
 }
 
 static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1484,6 +1484,27 @@ static bool tcp_nagle_check(bool partial, const struct tcp_sock *tp,
 		((nonagle & TCP_NAGLE_CORK) ||
 		 (!nonagle && tp->packets_out && tcp_minshall_check(tp)));
 }
+
+/* Return how many segs we'd like on a TSO packet,
+ * to send one TSO packet per ms
+ */
+static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now)
+{
+	u32 bytes, segs;
+
+	bytes = min(sk->sk_pacing_rate >> 10,
+		    sk->sk_gso_max_size - 1 - MAX_TCP_HEADER);
+
+	/* Goal is to send at least one packet per ms,
+	 * not one big TSO packet every 100 ms.
+	 * This preserves ACK clocking and is consistent
+	 * with tcp_tso_should_defer() heuristic.
+	 */
+	segs = max_t(u32, bytes / mss_now, sysctl_tcp_min_tso_segs);
+
+	return min_t(u32, segs, sk->sk_gso_max_segs);
+}
+
 /* Returns the portion of skb which can be sent right away */
 static unsigned int tcp_mss_split_point(const struct sock *sk,
 					const struct sk_buff *skb,
@@ -1687,7 +1708,7 @@ static int tso_fragment(struct sock *sk, struct sk_buff *skb, unsigned int len,
  * This algorithm is from John Heffner.
  */
 static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
-				 bool *is_cwnd_limited)
+				 bool *is_cwnd_limited, u32 max_segs)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -1717,8 +1738,7 @@ static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
 	limit = min(send_win, cong_win);
 
 	/* If a full-sized TSO skb can be sent, do it. */
-	if (limit >= min_t(unsigned int, sk->sk_gso_max_size,
-			   tp->xmit_size_goal_segs * tp->mss_cache))
+	if (limit >= max_segs * tp->mss_cache)
 		goto send_now;
 
 	/* Middle in queue won't get any more data, full sendable already? */
@@ -1915,6 +1935,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 	int cwnd_quota;
 	int result;
 	bool is_cwnd_limited = false;
+	u32 max_segs;
 
 	sent_pkts = 0;
 
@@ -1928,6 +1949,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 		}
 	}
 
+	max_segs = tcp_tso_autosize(sk, mss_now);
 	while ((skb = tcp_send_head(sk))) {
 		unsigned int limit;
 
@@ -1960,10 +1982,23 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 				break;
 		} else {
 			if (!push_one &&
-			    tcp_tso_should_defer(sk, skb, &is_cwnd_limited))
+			    tcp_tso_should_defer(sk, skb, &is_cwnd_limited,
+						 max_segs))
 				break;
 		}
 
+		limit = mss_now;
+		if (tso_segs > 1 && !tcp_urg_mode(tp))
+			limit = tcp_mss_split_point(sk, skb, mss_now,
+						    min_t(unsigned int,
+							  cwnd_quota,
+							  max_segs),
+						    nonagle);
+
+		if (skb->len > limit &&
+		    unlikely(tso_fragment(sk, skb, limit, mss_now, gfp)))
+			break;
+
 		/* TCP Small Queues :
 		 * Control number of packets in qdisc/devices to two packets / or ~1 ms.
 		 * This allows for :
@@ -1974,8 +2009,8 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 		 * of queued bytes to ensure line rate.
 		 * One example is wifi aggregation (802.11 AMPDU)
 		 */
-		limit = max_t(unsigned int, sysctl_tcp_limit_output_bytes,
-			      sk->sk_pacing_rate >> 10);
+		limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
+		limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes);
 
 		if (atomic_read(&sk->sk_wmem_alloc) > limit) {
 			set_bit(TSQ_THROTTLED, &tp->tsq_flags);
@@ -1988,18 +2023,6 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 				break;
 		}
 
-		limit = mss_now;
-		if (tso_segs > 1 && !tcp_urg_mode(tp))
-			limit = tcp_mss_split_point(sk, skb, mss_now,
-						    min_t(unsigned int,
-							  cwnd_quota,
-							  sk->sk_gso_max_segs),
-						    nonagle);
-
-		if (skb->len > limit &&
-		    unlikely(tso_fragment(sk, skb, limit, mss_now, gfp)))
-			break;
-
 		TCP_SKB_CB(skb)->when = tcp_time_stamp;
 
 		if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
-- 
1.7.10.4


^ permalink raw reply

* Re: [PATCH v2 0/6] net-PPP: Deletion of a few unnecessary checks
From: SF Markus Elfring @ 2014-12-12 16:56 UTC (permalink / raw)
  To: David Miller
  Cc: Sergei Shtylyov, Paul Mackerras, linux-ppp, netdev, Eric Dumazet,
	linux-kernel, kernel-janitors, Julia Lawall
In-Reply-To: <20141212.105106.1162681032492664308.davem@davemloft.net>


> You are asking me to invest a lot of time for a very trivial
> set of changes.

I find the proposed six update steps also trivial so far.


> Why don't you just integrate the feedback you were given and
> resubmit your patch series, just like any other developer would?

I find the requested redesign and reorganisation of involved data structures
not so easy and therefore more challenging at the moment.
Where should "the error pointers" be stored instead?
Is such a software improvement even a kind of add-on to my patch series?

Regards,
Markus

^ permalink raw reply

* Re: [PATCH net v2] net: smc91x: Fix build without gpiolib
From: Tobias Klauser @ 2014-12-12 16:45 UTC (permalink / raw)
  To: David Miller; +Cc: tony, nico, netdev
In-Reply-To: <20141212.113020.884577050283133272.davem@davemloft.net>

On 2014-12-12 at 17:30:20 +0100, David Miller <davem@davemloft.net> wrote:
> From: Tobias Klauser <tklauser@distanz.ch>
> Date: Fri, 12 Dec 2014 17:27:46 +0100
> 
> > GPIO is not needed in all configurations of the chip, they are optional.
> > That's why I was going for the ugly #ifdef solution.
> > 
> > It seems that there are stubs for the functions in question in
> > linux/gpio/consumer.h already as Tony suggested. Thus, adding an
> > #include <linux/gpio/consumer.h> to the driver should be enough. I'll
> > send an updated patch.
> 
> So for the "configurations" that need it, how does the user figure out
> that GPIO is needed?

Probably ionly by consulting the manual of the board usingthe chip.
Which presumably very few users will do.

> In my opinion, if the code blocks enabling the configurations that
> need this are enabled, so should GPIO be depended upon.
> 
> I think, at a minimum, when CONFIG_OF is enabled smsc91x should
> require GPIO.

Agreed. This is the better solution, causing less surprises for the
user. Should this be a "select GPIOLIB if OF" then?

^ permalink raw reply

* Re: [PATCH net v2] net: smc91x: Fix build without gpiolib
From: Tony Lindgren @ 2014-12-12 16:34 UTC (permalink / raw)
  To: David Miller; +Cc: tklauser, nico, netdev
In-Reply-To: <20141212.113020.884577050283133272.davem@davemloft.net>

* David Miller <davem@davemloft.net> [141212 08:32]:
> From: Tobias Klauser <tklauser@distanz.ch>
> Date: Fri, 12 Dec 2014 17:27:46 +0100
> 
> > GPIO is not needed in all configurations of the chip, they are optional.
> > That's why I was going for the ugly #ifdef solution.
> > 
> > It seems that there are stubs for the functions in question in
> > linux/gpio/consumer.h already as Tony suggested. Thus, adding an
> > #include <linux/gpio/consumer.h> to the driver should be enough. I'll
> > send an updated patch.
> 
> So for the "configurations" that need it, how does the user figure out
> that GPIO is needed?
> 
> In my opinion, if the code blocks enabling the configurations that
> need this are enabled, so should GPIO be depended upon.
> 
> I think, at a minimum, when CONFIG_OF is enabled smsc91x should
> require GPIO.

Well in many cases these control GPIOs, or some part of them, are
hardwired with pulls. That's why they are optional.

Regards,

Tony

^ permalink raw reply

* Re: [PATCH net v2] net: smc91x: Fix build without gpiolib
From: David Miller @ 2014-12-12 16:30 UTC (permalink / raw)
  To: tklauser; +Cc: tony, nico, netdev
In-Reply-To: <20141212162746.GJ16916@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Fri, 12 Dec 2014 17:27:46 +0100

> GPIO is not needed in all configurations of the chip, they are optional.
> That's why I was going for the ugly #ifdef solution.
> 
> It seems that there are stubs for the functions in question in
> linux/gpio/consumer.h already as Tony suggested. Thus, adding an
> #include <linux/gpio/consumer.h> to the driver should be enough. I'll
> send an updated patch.

So for the "configurations" that need it, how does the user figure out
that GPIO is needed?

In my opinion, if the code blocks enabling the configurations that
need this are enabled, so should GPIO be depended upon.

I think, at a minimum, when CONFIG_OF is enabled smsc91x should
require GPIO.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox