Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 2/3] Implementation of RFC 4898 Extended TCP Statistics (Web10G)
From: David Miller @ 2014-12-16 22:44 UTC (permalink / raw)
  To: eric.dumazet; +Cc: rapier, alexei.starovoitov, netdev
In-Reply-To: <1418769212.9773.65.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 16 Dec 2014 14:33:32 -0800

> There is very little chance web10g ~3000 lines of code are added into
> linux TCP stack, by people who did not submit netdev changes in last
> years.

+1

> At Google, we tried the web10g route, but reverted it (today !) in favor
> of tcp_info extensions (ss command from iproute2 can also grab/display
> these), after too many bugs being filled.

+1

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Jason Baron @ 2014-12-16 22:40 UTC (permalink / raw)
  To: Josef Bacik, Eric Dumazet, Alexei Starovoitov, Laurent Chavey,
	Yuchung Cheng
  Cc: Martin KaFai Lau, netdev@vger.kernel.org, David S. Miller,
	Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
	Kernel Team
In-Reply-To: <548F0F62.7080704@fb.com>

On 12/15/2014 11:42 AM, Josef Bacik wrote:
> On 12/15/2014 11:03 AM, Eric Dumazet wrote:
>> On Sun, 2014-12-14 at 22:55 -0800, Alexei Starovoitov wrote:
>>
>>> I think patches 1 and 3 are good additions, since they establish
>>> few permanent points of instrumentation in tcp stack.
>>> Patches 4-5 look more like use cases of tracepoints established
>>> before. They may feel like simple additions and, no doubt,
>>> they are useful, but since they expose things via tracing
>>> infra they become part of api and cannot be changed later,
>>> when more stats would be needed.
>>> I think systemtap like scripting on top of patches 1 and 3
>>> should solve your use case ?
>>> Also, have you looked at recent eBPF work?
>>> Though it's not completely ready yet, soon it should
>>> be able to do the same stats collection as you have
>>> in 4/5 without adding permanent pieces to the kernel.
>>
>> So it looks like web10g like interfaces are very often requested by
>> various teams.
>>
>> And we have many different views on how to hack this. I am astonished by
>> number of hacks I saw about this stuff going on.
>>
>> What about a clean way, extending current TCP_INFO, which is both
>> available as a getsockopt() for socket owners and ss/iproute2
>> information for 'external entities'
>>
>> If we consider web10g info needed, then adding a ftrace/eBPF like
>> interface is simply yet another piece of code we need to maintain,
>> and the argument of 'this should cost nothing if not activated' is
>> nonsense since major players need to constantly monitor TCP metrics and
>> behavior.
>>
>> It seems both FaceBook and Google are working on a subset of web10g.
>>
>> I suggest we meet together and establish a common ground, preferably
>> after Christmas holidays.
>>
>
> We've set up something for exactly this case at the end of January but
> have yet to get a response from Google.  If any of the Google people
> cc'ed (or really anybody, its not a strictly FB/Google thing) is
> interested please email me directly and I'll send you the details, we
> will be meeting face to face in the bay area at the end of January. 
> Thanks,
>
> Josef

We are interested in tcp tracing as well. Another requirement that we
have that
I don't think I saw is the ability to start/stop tracing on sockets
(potentially
multiple times) during the lifetime of a connection. So for example, the
ability
to use setsockopt(), to selectively start/stop tracing on a connection,
so as not
to incur overhead for non-traced sockets.

Thanks,

-Jason

^ permalink raw reply

* Re: [PATCH net-next 2/3] Implementation of RFC 4898 Extended TCP Statistics (Web10G)
From: Eric Dumazet @ 2014-12-16 22:33 UTC (permalink / raw)
  To: rapier; +Cc: David Miller, alexei.starovoitov, netdev
In-Reply-To: <54909DD5.5070202@psc.edu>

On Tue, 2014-12-16 at 16:02 -0500, rapier wrote:

> I understand where you are coming from. I do believe that our 
> methodology provides some advantages over a tcp_info solution. I'll 
> provide some information on that tomorrow after I've had a chance to 
> talk about this in more depth with our dev team. That being said, we 
> only really care about the instrument set being incorporated as such we 
> will take a closer look tcp_info shortly.

Hmm...

There is very little chance web10g ~3000 lines of code are added into
linux TCP stack, by people who did not submit netdev changes in last
years.

At Google, we tried the web10g route, but reverted it (today !) in favor
of tcp_info extensions (ss command from iproute2 can also grab/display
these), after too many bugs being filled.

Researchers love/want to have hundred of metrics. This does not mean
linux has to provide them natively, unless we can prove it is really
damn useful.

Sorry, but someone had to raise some reality concerns.

tcp_info _is_ extensible, granted you do not try to push 127 new metrics
in it.

http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=977cb0ecf82eb6d15562573c31edebf90db35163

^ permalink raw reply

* Re: [PATCH net-next 2/3] Implementation of RFC 4898 Extended TCP Statistics (Web10G)
From: Dominic Hamon @ 2014-12-16 22:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20141216.151823.2276708539799601894.davem@davemloft.net>

On Tue, Dec 16, 2014 at 03:18:23PM -0500, David Miller wrote:
> From: rapier <rapier@psc.edu>
> Date: Tue, 16 Dec 2014 15:13:44 -0500
> 
> > On 12/16/14, 3:03 PM, David Miller wrote:
> > 
> >> You shouldn't need to export any symbols.
> > 
> > As a point of clarification - is it acceptable to export symbols for
> > use with in tree modules such as tcp_htcp? We are more than willing to
> > do the work required to bring this in line with best practices.
> 
> I'm saying for data and TCP statistics collection, you shouldn't need
> to add any new symbol exports.
> 
> Keep this in the main kernel, nothing external should be needed.
> 
> Extending tcp_info or similar is the only reasonable way to implement
> this stuff.

There are two aspects of the tcp_estats approach that I think are worth
considering separately:
  - the breadth of instrumentation
  - the access method through kernel module/netlink

I don't see any reason why we can't take the instrumentation parts of
these patches and plumb them in to tcp_info.

I would prefer if we can retain the nested structures and the sysctl
that controls the set of instruments collected as I think this key
feature gives plenty of flexibility to system operators in terms of an
overhead/instrumentation trade-off, however I recognise this complicates
how the socket option is returned, and how ss might report the metrics.

--
Dominic Hamon

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
From: B Viswanath @ 2014-12-16 21:51 UTC (permalink / raw)
  To: Arad, Ronen
  Cc: netdev@vger.kernel.org, John Fastabend, Roopa Prabhu,
	Jamal Hadi Salim, Jiri Pirko, sfeldma@gmail.com, bcrl@kvack.org,
	tgraf@suug.ch, stephen@networkplumber.org, linville@tuxdriver.com,
	vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
	gospo@cumulusnetworks.com
In-Reply-To: <E4CD12F19ABA0C4D8729E087A761DC3505DB588F@ORSMSX101.amr.corp.intel.com>

On 17 December 2014 at 02:22, Arad, Ronen <ronen.arad@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-
>> owner@vger.kernel.org] On Behalf Of B Viswanath
>> Sent: Tuesday, December 16, 2014 9:24 PM
>> To: Arad, Ronen
>> Cc: John Fastabend; netdev@vger.kernel.org; Roopa Prabhu; Jamal Hadi
>> Salim; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
>> stephen@networkplumber.org; linville@tuxdriver.com;
>> vyasevic@redhat.com; davem@davemloft.net;
>> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
>> bridge port attributes
>>
>> Hi,
>>
>> This is my first email on this thread, and on this list. My apologies if I have
>> not understood something correctly. I would like to participate  in this
>> discussion, which is one of the reasons I joined this list recently.  Some
>> feedback inline below.
>>
>> On 16 December 2014 at 22:59, Arad, Ronen <ronen.arad@intel.com> wrote:
>> >
>> >
> <sniped for brevity>
>> >>
>> >> I'm still missing why there is duplicate implementations in the driver.
>> >> If the driver implements the set of ndo ops why should it care who
>> >> calls them? I think you tried to explain this already but I'm not seeing it.
>> >>
>> >
>> > Let's consider a bridge property. I'll use the default PVID attribute as an
>> example. This is currently configurable by sysfs only and a netlink support for
>> that is still due. Let's assume for our discussion that a DEAFAULT_PVID
>> attribute will be added as a bridge attribute within AFSPEC nested attribute
>> of AF_BRIDGE SETLINK message.
>> > When a bridge device is present, this attribute is processed by the bridge
>> module and saved as default_pvid field in net_bridge structure. When a
>> switch port is enslaved to a bridge, the bridge driver creates a
>> net_bridge_port instance and assigns it a pvid inherited from the
>> default_pvid attribute of the bridge. Setting the pvid for a new enslaved
>> switch port is not done via netlink. It only applies to the net_bridge_port
>> structure which is internal to the bridge module. Offloading this to HW is not
>> addressed with current bridge offloading.
>> >
>> > When a bridge device is not used, the DEFAULT_PVID will be targeted using
>> the SELF flag to any of the switch ports. The driver will recognize that as a
>> bridge port and will need to maintain some switch global structure similar to
>> net_bridge where it could save the default_pvid. The driver, knowing that the
>> switch port is not enslaved to a bridge, will have to replicate the same
>> functionality. In the HW case, it will have to configure default VLAN on all the
>> switch ports.
>> > This is different from the yet to be defined way of propagating default PVID
>> from a bridge device to offloaded bridge ports.
>> >
>> > Another example is STP. STP attributes are bridge attributes which are not
>> offloaded when a bridge device is present. The bridge module handles STP
>> protocol internally. Without bridge device, STP attributes have to be targeted
>> at a switch port device and the driver should save them in driver-specific
>> structures and have proprietary implementation of STP (as the one in the
>> bridge module is not used).
>>
>> In general I feel that the switch-device and port relation should be that of the
>> 'container-containee'. This is the actual physical relationship.  Apart from
>> some operations such as vlans and protocol related, it is tricky to model all
>> operations directly on ports. My thinking is it is cleaner to have all operations
>> be on switch-device, which in turn peculates the operations downward, to its
>> contained ports as applicable.  The offloading is really a property of the
>> switch device and not individual ports. Similarly the FDB is maintained by the
>> switch and not the ports. As we extend the current offloading mechanism to
>> other L2, L3 and other features, we may find it easier to have a 'switch-
>> device' in place.
>>
>> I am somewhat confused with the notion of bridges though. Many existing
>> linux-based routers use bridges differently than as a vlan-broadcast-domain.
>> For example it is common to have eth0.334 and
>> eth1 in the same bridge. What is being done internally is that the additional
>> vlan tag 334 (which indicates video traffic, say) is removed and that video
>> traffic is being bridged to eth1. There is no default vlan for this bridge.  This is
>> a software bridge. I am not sure how this can be accomplished if there is a
>> need to associate a vlan with a bridge.
>>
>> Thanks
>> Viswanath
>>
>>
>
> Let's say we have three ports sp1, sp2, sp3 and two VLANs 10 and 20.
> VLAN 10 is allowed on ports sp1 and sp2 and VLAN 20 is allowed on ports sp2 and sp3.
> Let's say sp1 and sp3 are access ports and carry untagged traffic. Sp2 is an uplink trunk port and carries tagged traffic.
>
> This could be modeled using Linux bridge in at least two ways:
>
> 1) Bridge per VLAN
>         - Two bridge devices are used br10 and br20
>         - sp2.10 is a vlan interface on sp2 for VLAN 10
>         - sp2.20 is a vlan interface on sp2 for VLAN 20
>         - br10 enslaves ports sp1 and sp2.10
>         - br20 enslaves ports sp3 and sp2.20
>         - br10 and br20 could be L3 interfaces and have IP address assigned to.
>                  This allows for routing between VLANs.
>         - Traffic sent to br10 will egress untagged on sp1 and tagged with VID=10 on sp2
>         - Traffic sent to br20 will egress untagged on sp2 and tagged with VID=20 on sp2
>         - Traffic received on sp1 is delivered to br10 and could be flooded if MAC DA is broadcast or unknown unicast.
>         - Traffic received on sp2 with VID=10 is delivered to br10 after the VLAN tag is removed.
>         - Similarly traffic receive on sp3 or tagged traffic with VID=20 on sp2 is delivered to br20

Thanks for the explanation. Understood this, and this is how I
saw/implemented things so far. However, my understanding is that both
br10 and br20 have nothing to do with vlan 10 and 20 respectively. The
traffic the actual bridges see (say if we ran tcpdump on br10 or br20)
is always untagged. The tagging happens while the packets are
egressing the sp2 port. In that sense, there is no vlan associated
with either of the bridges.

What I was not sure was about how we can associate a vlan with such a
bridge in this case, and what it really means for the traffic.

>
> 2) Single bridge with VLAN filtering
>         - A single bridge device br0 is used
>         - All ports sp1, sp2, and sp3 are enslaved to the bridge
>         - br0 allows both VID=10 and VID=20
>         - VLAN policy on the ports determines the bridging domains within the bridge
>                 - sp1 allows VID=10, it is untagged on egress, and it is the PVID of sp1.
>                   Linux does not provide a way to only allow untagged traffic to enter a port.
>                   Both tagged traffic with VID=10 and untagged traffic are considered received on VLAN 10.
>                 - sp3 allows VID=20, it is untagged on egress, and it is the PVID of sp2.
>                 - sp2 allows VID=10 and VID=20, both are tagged on egress, and there is no PVID so untagged traffic received on sp2 is dropped.
>
>         - Above configuration is sufficient for L2 switching with two distinct VLANs.
>         - L3 routing across VLANs in this model is achieved by vlan interfaces on the bridge. br0.10 for VLAN 10 and br0.20 for VLAN 20.
>         - br0.10 and br0.20 are L3 interfaces and have IP addresses assigned.
>                 - br0.10 allows VID=10

In this case am I correct in assuming that br0 actually represents the
switch device ? I don't see a way in which one more such bridge can
exist, since the forwarding decisions are handled at the switch-device
level.

Thanks
Viswanath

>
>
>> >
>> >
>> >> [...]
>> >>
>> >> I'll need to think about the l3 stuff but I think Jiri/Scott/Roopa
>> >> might have worked some of it out.
>> >>
>> >> --
>> >> John Fastabend         Intel Corporation
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe netdev" in
>> > the body of a message to majordomo@vger.kernel.org More majordomo
>> info
>> > at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
>> of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: net: integer overflow in ip_idents_reserve
From: Eric Dumazet @ 2014-12-16 21:47 UTC (permalink / raw)
  To: Sasha Levin
  Cc: David S. Miller, LKML, netdev@vger.kernel.org, Andrey Ryabinin,
	Dave Jones
In-Reply-To: <5490A1F8.6020207@oracle.com>

On Tue, 2014-12-16 at 16:19 -0500, Sasha Levin wrote:
> Hi Eric,
> 
> While fuzzing with trinity on a -next kernel with the undefined behaviour
> sanitizer path, I've observed the following warning in code which was
> introduced in 04ca6973f7 ("ip: make IP identifiers less predictable"):

This is a false positive.

We don't really care of the value if (now - old) is too big :

No packet was sent recently, so IP ID being X or Y is not a concern.

^ permalink raw reply

* net: integer overflow in ip_idents_reserve
From: Sasha Levin @ 2014-12-16 21:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, LKML, netdev@vger.kernel.org, Andrey Ryabinin,
	Dave Jones

Hi Eric,

While fuzzing with trinity on a -next kernel with the undefined behaviour
sanitizer path, I've observed the following warning in code which was
introduced in 04ca6973f7 ("ip: make IP identifiers less predictable"):

[  234.317163] ================================================================================
[  234.320001] UBSan: Undefined behaviour in ./arch/x86/include/asm/atomic.h:157:9
[  234.321568] signed integer overflow:
[  234.322772] 1678406574 + 641542997 cannot be represented in type 'int'
[  234.324316] CPU: 2 PID: 16819 Comm: trinity-c537 Not tainted 3.18.0-next-20141216-sasha-00065-g3c56201-dirty #1609
[  234.326548]  0000000000000000 0000000000000000 ffffffffbc2e4e10 ffff8802e63137e8
[  234.327837]  ffffffffb126bd68 1ffffffff7aa2c03 ffffffffbc2e5c34 ffff8802e6313808
[  234.329117]  ffffffffb126df6f 1ffffffff7aa2c03 ffffffffbc2e5c34 ffff8802e63138c8
[  234.330755] Call Trace:
[  234.331213] dump_stack (lib/dump_stack.c:52)
[  234.332025] ubsan_epilogue (lib/ubsan.c:159)
[  234.332986] handle_overflow (lib/ubsan.c:191)
[  234.334022] ? preempt_schedule (./arch/x86/include/asm/preempt.h:77 (discriminator 1) kernel/sched/core.c:2898 (discriminator 1))
[  234.334945] ? ___preempt_schedule (arch/x86/lib/thunk_64.S:42)
[  234.335919] __ubsan_handle_add_overflow (lib/ubsan.c:200)
[  234.337211] ip_idents_reserve (./arch/x86/include/asm/atomic.h:157 net/ipv4/route.c:482)
[  234.338935] __ip_select_ident (include/uapi/linux/swab.h:49 (discriminator 3) net/ipv4/route.c:498 (discriminator 3))
[  234.340773] __ip_make_skb (include/net/ip.h:339 include/net/ip.h:345 net/ipv4/ip_output.c:1386)
[  234.342736] ip_push_pending_frames (include/net/ip.h:148 net/ipv4/ip_output.c:1430)
[  234.344707] raw_sendmsg (net/ipv4/raw.c:644)
[  234.346537] ? system_call_fastpath (arch/x86/kernel/entry_64.S:423)
[  234.348431] ? get_parent_ip (kernel/sched/core.c:2564)
[  234.350259] ? preempt_count_sub (kernel/sched/core.c:2620)
[  234.352170] inet_sendmsg (net/ipv4/af_inet.c:734)
[  234.354107] do_sock_sendmsg (net/socket.c:646 (discriminator 4))
[  234.355947] ? retint_restore_args (arch/x86/kernel/entry_64.S:844)
[  234.357962] ___sys_sendmsg (net/socket.c:653 net/socket.c:2094)
[  234.359545] ? sched_clock (./arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:304)
[  234.361182] ? __acct_update_integrals (kernel/tsacct.c:147)
[  234.363394] ? acct_account_cputime (kernel/tsacct.c:168)
[  234.365417] __sys_sendmsg (net/socket.c:2131)
[  234.367248] SyS_sendmsg (net/socket.c:2136)
[  234.368925] system_call_fastpath (arch/x86/kernel/entry_64.S:423)
[  234.371038] ================================================================================


Thanks,
Sasha

^ permalink raw reply

* Re: Adding Unisys virtnic.c to the Network Tree
From: Andy Gospodarek @ 2014-12-16 21:13 UTC (permalink / raw)
  To: Erik Arfvidson
  Cc: davem, netdev, benjamin.romer, dzickus, prarit, Bruce.Vessey,
	sparmaintainer
In-Reply-To: <54909B71.10609@redhat.com>

On Tue, Dec 16, 2014 at 03:52:01PM -0500, Erik Arfvidson wrote:
> Hi Dave,
> 
> I'm a partner engineer at Red Hat working for Unisys. Currently most of our
> driver for our system reside in the staging tree Maintained by Greg KH. It
> was suggested by one of the engineers at Red Hat Don Zickus that in order to
> accelerate the process of moving the remaining drivers we pushed directly to
> their specific system in the Linux Kernel. Currently virtnic which is our
> Virtual Network driver resides internally at Unisys and dependencies are in
> the staging tree(drivers/staging/unisys/). So would you be willing to take a
> look at our Network driver in order to add it to your Network tree?

Erik,

I realize I'm not Dave, but I thought I could respond anyway.  :)

I suspect that if you posted the driver to this list, more eyes than
just Dave would be willing to take a look and provide inline feedback.

-andy

^ permalink raw reply

* Re: [PATCH net-next 2/3] Implementation of RFC 4898 Extended TCP Statistics (Web10G)
From: rapier @ 2014-12-16 21:02 UTC (permalink / raw)
  To: David Miller; +Cc: alexei.starovoitov, netdev
In-Reply-To: <20141216.151823.2276708539799601894.davem@davemloft.net>



On 12/16/14 3:18 PM, David Miller wrote:
> From: rapier <rapier@psc.edu>
> Date: Tue, 16 Dec 2014 15:13:44 -0500
>
>> On 12/16/14, 3:03 PM, David Miller wrote:
>>
>>> You shouldn't need to export any symbols.
>>
>> As a point of clarification - is it acceptable to export symbols for
>> use with in tree modules such as tcp_htcp? We are more than willing to
>> do the work required to bring this in line with best practices.
>
> I'm saying for data and TCP statistics collection, you shouldn't need
> to add any new symbol exports.

We've been able to identify that as the code stands we only need one 
export for the KIS. That being said, I understand if that's one too 
many. The DLKM (which hasn't been submitted) does require two additional 
symbols and that's an oversight on our part. We'll work to eliminate 
those as well.

> Keep this in the main kernel, nothing external should be needed.
>
> Extending tcp_info or similar is the only reasonable way to implement
> this stuff.

I understand where you are coming from. I do believe that our 
methodology provides some advantages over a tcp_info solution. I'll 
provide some information on that tomorrow after I've had a chance to 
talk about this in more depth with our dev team. That being said, we 
only really care about the instrument set being incorporated as such we 
will take a closer look tcp_info shortly.

Chris

^ permalink raw reply

* RE: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
From: Arad, Ronen @ 2014-12-16 20:52 UTC (permalink / raw)
  To: B Viswanath, netdev@vger.kernel.org
  Cc: John Fastabend, Roopa Prabhu, Jamal Hadi Salim, Jiri Pirko,
	sfeldma@gmail.com, bcrl@kvack.org, tgraf@suug.ch,
	stephen@networkplumber.org, linville@tuxdriver.com,
	vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
	gospo@cumulusnetworks.com
In-Reply-To: <CAN+pFwJuOBb2c49TUqX357DQLY9a7GUs7Kufne0Td=jSJ+nFeA@mail.gmail.com>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of B Viswanath
> Sent: Tuesday, December 16, 2014 9:24 PM
> To: Arad, Ronen
> Cc: John Fastabend; netdev@vger.kernel.org; Roopa Prabhu; Jamal Hadi
> Salim; Jiri Pirko; sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
> stephen@networkplumber.org; linville@tuxdriver.com;
> vyasevic@redhat.com; davem@davemloft.net;
> shm@cumulusnetworks.com; gospo@cumulusnetworks.com
> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
> bridge port attributes
> 
> Hi,
> 
> This is my first email on this thread, and on this list. My apologies if I have
> not understood something correctly. I would like to participate  in this
> discussion, which is one of the reasons I joined this list recently.  Some
> feedback inline below.
> 
> On 16 December 2014 at 22:59, Arad, Ronen <ronen.arad@intel.com> wrote:
> >
> >
<sniped for brevity>
> >>
> >> I'm still missing why there is duplicate implementations in the driver.
> >> If the driver implements the set of ndo ops why should it care who
> >> calls them? I think you tried to explain this already but I'm not seeing it.
> >>
> >
> > Let's consider a bridge property. I'll use the default PVID attribute as an
> example. This is currently configurable by sysfs only and a netlink support for
> that is still due. Let's assume for our discussion that a DEAFAULT_PVID
> attribute will be added as a bridge attribute within AFSPEC nested attribute
> of AF_BRIDGE SETLINK message.
> > When a bridge device is present, this attribute is processed by the bridge
> module and saved as default_pvid field in net_bridge structure. When a
> switch port is enslaved to a bridge, the bridge driver creates a
> net_bridge_port instance and assigns it a pvid inherited from the
> default_pvid attribute of the bridge. Setting the pvid for a new enslaved
> switch port is not done via netlink. It only applies to the net_bridge_port
> structure which is internal to the bridge module. Offloading this to HW is not
> addressed with current bridge offloading.
> >
> > When a bridge device is not used, the DEFAULT_PVID will be targeted using
> the SELF flag to any of the switch ports. The driver will recognize that as a
> bridge port and will need to maintain some switch global structure similar to
> net_bridge where it could save the default_pvid. The driver, knowing that the
> switch port is not enslaved to a bridge, will have to replicate the same
> functionality. In the HW case, it will have to configure default VLAN on all the
> switch ports.
> > This is different from the yet to be defined way of propagating default PVID
> from a bridge device to offloaded bridge ports.
> >
> > Another example is STP. STP attributes are bridge attributes which are not
> offloaded when a bridge device is present. The bridge module handles STP
> protocol internally. Without bridge device, STP attributes have to be targeted
> at a switch port device and the driver should save them in driver-specific
> structures and have proprietary implementation of STP (as the one in the
> bridge module is not used).
> 
> In general I feel that the switch-device and port relation should be that of the
> 'container-containee'. This is the actual physical relationship.  Apart from
> some operations such as vlans and protocol related, it is tricky to model all
> operations directly on ports. My thinking is it is cleaner to have all operations
> be on switch-device, which in turn peculates the operations downward, to its
> contained ports as applicable.  The offloading is really a property of the
> switch device and not individual ports. Similarly the FDB is maintained by the
> switch and not the ports. As we extend the current offloading mechanism to
> other L2, L3 and other features, we may find it easier to have a 'switch-
> device' in place.
> 
> I am somewhat confused with the notion of bridges though. Many existing
> linux-based routers use bridges differently than as a vlan-broadcast-domain.
> For example it is common to have eth0.334 and
> eth1 in the same bridge. What is being done internally is that the additional
> vlan tag 334 (which indicates video traffic, say) is removed and that video
> traffic is being bridged to eth1. There is no default vlan for this bridge.  This is
> a software bridge. I am not sure how this can be accomplished if there is a
> need to associate a vlan with a bridge.
> 
> Thanks
> Viswanath
> 
>

Let's say we have three ports sp1, sp2, sp3 and two VLANs 10 and 20.
VLAN 10 is allowed on ports sp1 and sp2 and VLAN 20 is allowed on ports sp2 and sp3.
Let's say sp1 and sp3 are access ports and carry untagged traffic. Sp2 is an uplink trunk port and carries tagged traffic.

This could be modeled using Linux bridge in at least two ways:

1) Bridge per VLAN
	- Two bridge devices are used br10 and br20
	- sp2.10 is a vlan interface on sp2 for VLAN 10
	- sp2.20 is a vlan interface on sp2 for VLAN 20
	- br10 enslaves ports sp1 and sp2.10
	- br20 enslaves ports sp3 and sp2.20
	- br10 and br20 could be L3 interfaces and have IP address assigned to.
                 This allows for routing between VLANs.
	- Traffic sent to br10 will egress untagged on sp1 and tagged with VID=10 on sp2
	- Traffic sent to br20 will egress untagged on sp2 and tagged with VID=20 on sp2
	- Traffic received on sp1 is delivered to br10 and could be flooded if MAC DA is broadcast or unknown unicast.
	- Traffic received on sp2 with VID=10 is delivered to br10 after the VLAN tag is removed.
	- Similarly traffic receive on sp3 or tagged traffic with VID=20 on sp2 is delivered to br20

2) Single bridge with VLAN filtering
	- A single bridge device br0 is used
	- All ports sp1, sp2, and sp3 are enslaved to the bridge
	- br0 allows both VID=10 and VID=20
	- VLAN policy on the ports determines the bridging domains within the bridge
		- sp1 allows VID=10, it is untagged on egress, and it is the PVID of sp1.
		  Linux does not provide a way to only allow untagged traffic to enter a port.
		  Both tagged traffic with VID=10 and untagged traffic are considered received on VLAN 10.
		- sp3 allows VID=20, it is untagged on egress, and it is the PVID of sp2.
		- sp2 allows VID=10 and VID=20, both are tagged on egress, and there is no PVID so untagged traffic received on sp2 is dropped.
		
	- Above configuration is sufficient for L2 switching with two distinct VLANs.
	- L3 routing across VLANs in this model is achieved by vlan interfaces on the bridge. br0.10 for VLAN 10 and br0.20 for VLAN 20.
	- br0.10 and br0.20 are L3 interfaces and have IP addresses assigned.
		- br0.10 allows VID=10

 
> >
> >
> >> [...]
> >>
> >> I'll need to think about the l3 stuff but I think Jiri/Scott/Roopa
> >> might have worked some of it out.
> >>
> >> --
> >> John Fastabend         Intel Corporation
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org More majordomo
> info
> > at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Adding Unisys virtnic.c to the Network Tree
From: Erik Arfvidson @ 2014-12-16 20:52 UTC (permalink / raw)
  To: davem, netdev
  Cc: benjamin.romer, dzickus, prarit, Bruce.Vessey, sparmaintainer
In-Reply-To: <5490976C.20409@redhat.com>

Hi Dave,

I'm a partner engineer at Red Hat working for Unisys. Currently most of 
our driver for our system reside in the staging tree Maintained by Greg 
KH. It was suggested by one of the engineers at Red Hat Don Zickus that 
in order to accelerate the process of moving the remaining drivers we 
pushed directly to their specific system in the Linux Kernel. Currently 
virtnic which is our Virtual Network driver resides internally at Unisys 
and dependencies are in the staging tree(drivers/staging/unisys/). So 
would you be willing to take a look at our Network driver in order to 
add it to your Network tree?

Cheers,
Erik Arfvidson

^ permalink raw reply

* Re: [PATCH 2/2] ip_tunnel: Add missing validation of encap type to ip_tunnel_encap_setup()
From: Thomas Graf @ 2014-12-16 20:50 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Netdev List
In-Reply-To: <CA+mtBx_364iHXxLkA=Ytb=34_2zEy9-3hqfOdt96Ckhm8NwFiQ@mail.gmail.com>

On 12/16/14 at 12:23pm, Tom Herbert wrote:
> On Tue, Dec 16, 2014 at 12:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > The encap->type comes straight from Netlink. Validate it against
> > max supported encap types just like ip_encap_hlen() already does.
> >
> > Fixes: a8c5f9 ("ip_tunnel: Ops registration for secondary encap (fou, gue)")
> > Signed-off-by: Thomas Graf <tgraf@suug.ch>
> > ---
> >  net/ipv4/ip_tunnel.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> > index 2f498f8..d3e4479 100644
> > --- a/net/ipv4/ip_tunnel.c
> > +++ b/net/ipv4/ip_tunnel.c
> > @@ -573,6 +573,9 @@ int ip_tunnel_encap(struct sk_buff *skb, struct ip_tunnel *t,
> >         if (t->encap.type == TUNNEL_ENCAP_NONE)
> >                 return 0;
> >
> > +       if (t->encap.type >= MAX_IPTUN_ENCAP_OPS)
> > +               return -EINVAL;
> > +
> 
> I don't think this is technically needed, we should have already
> verified the type when setting up the tunnel (ip_encap_hlen).

Right, assuming that every API user always calls ip_tunnel_encap_setup()
on changelink. It's currently the case but since this is a exported
API I figured we better be safe than sorry, in particular as
ip_tunnel_encap() is called before ip_encap_hlen() on xmit.

^ permalink raw reply

* Re: [PATCH net] net: Disallow providing non zero VLAN ID for NIC drivers FDB add flow
From: David Miller @ 2014-12-16 20:41 UTC (permalink / raw)
  To: ogerlitz; +Cc: netdev, jiri, gospo, jhs, john.r.fastabend
In-Reply-To: <1418573945-27840-1-git-send-email-ogerlitz@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Sun, 14 Dec 2014 18:19:05 +0200

> The current implementations all use dev_uc_add_excl() and such whose API
> doesn't support vlans, so we can't make it with NICs HW for now.
> 
> Fixes: f6f6424ba773 ('net: make vid as a parameter for ndo_fdb_add/ndo_fdb_del')
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

Applied, thanks.

^ permalink raw reply

* Re: [bisected] tg3 broken in 3.18.0?
From: Nils Holland @ 2014-12-16 20:38 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: rajatxjain, David Miller, netdev, linux-pci@vger.kernel.org
In-Reply-To: <54907324.4040102@gmail.com>

On Tue, Dec 16, 2014 at 04:00:04PM -0200, Marcelo Ricardo Leitner wrote:
> On 16-12-2014 14:04, Rajat Jain wrote:
> > Hello All,
> >
> > Apologies for jumping in late, but for some reason I do not see the
> > original mail in my inbox. However I am taking a look at the mails as
> > sent on linux-pci (and I will keep an eye out for the bug report that
> > Bjorn asked for).
> >
> 
> np!
> Nils would you create that BZ please? As you did all the bisect.. :)

I'm only just catching up with all the new messages in this thread,
but I've already opened a bug report, as requested. Come and find it
at:

https://bugzilla.kernel.org/show_bug.cgi?id=89821

Feel free to add more relevant details to it! :-)

Greetings,
Nils

^ permalink raw reply

* Re: [PATCH_V2] dm9000: Add regulator and reset support to dm9000
From: David Miller @ 2014-12-16 20:25 UTC (permalink / raw)
  To: Zubair.Kakakhel; +Cc: devicetree, linux-kernel, netdev, paul.burton
In-Reply-To: <1418748391-9955-1-git-send-email-Zubair.Kakakhel@imgtec.com>

From: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>
Date: Tue, 16 Dec 2014 16:46:31 +0000

> In boards, the dm9000 chip's power and reset can be controlled by gpio.
> 
> It makes sense to add them to the dm9000 driver and let dt be used to
> enable power and reset the phy.
> 
> Signed-off-by: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>
> Signed-off-by: Paul Burton <paul.burton@imgtec.com>

This looks like a new feature to me, and therefore only appropriate
for net-next, which is closed right now.

Please resubmit this when the net-next tree opens back up.

^ permalink raw reply

* Re: [PATCH net-next 1/1] net: fec: Fix NAPI race
From: David Miller @ 2014-12-16 20:24 UTC (permalink / raw)
  To: b38611; +Cc: netdev, R49496, bhutchings, stephen
In-Reply-To: <1418725558-17287-1-git-send-email-b38611@freescale.com>

From: Fugang Duan <b38611@freescale.com>
Date: Tue, 16 Dec 2014 18:25:58 +0800

> Do camera capture test on i.MX6q sabresd board, and save the capture data to
> nfs rootfs. The command is:
> gst-launch-1.0 -e imxv4l2src device=/dev/video1 num-buffers=2592000 ! tee name=t !
> queue ! imxv4l2sink sync=false t. ! queue ! vpuenc ! queue ! mux. pulsesrc num-buffers=3720937
> blocksize=4096 ! 'audio/x-raw, rate=44100, channels=2' ! queue ! imxmp3enc ! mpegaudioparse !
> queue ! mux. qtmux name=mux ! filesink location=video_recording_long.mov
> 
> After about 10 hours running, there have net watchdog timeout kernel dump:
 ...
> There might have a race in napi_schedule(), leaving interrupts disabled forever.
> After these patch, the case still work more than 40 hours running.
> 
> Signed-off-by: Fugang Duan <B38611@freescale.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] net/mlx4: Cache line CQE/EQE stride fixes
From: David Miller @ 2014-12-16 20:24 UTC (permalink / raw)
  To: amirv; +Cc: netdev, yevgenyp, ogerlitz, clsoto, idos, weiyang
In-Reply-To: <54902597.50105@mellanox.com>

From: Amir Vadai <amirv@mellanox.com>
Date: Tue, 16 Dec 2014 14:29:11 +0200

> On 12/16/2014 1:28 PM, Amir Vadai wrote:
>> From: Ido Shamay <idos@mellanox.com>
>> 
>> This commit contains 2 fixes for the 128B CQE/EQE stride feaure.
>> Wei found that mlx4_QUERY_HCA function marked the wrong capability
>> in flags (64B CQE/EQE), when CQE/EQE stride feature was enabled.
>> Also added small fix in initial CQE ownership bit assignment, when CQE
>> is size is not default 32B.
>> 
>> Fixes: 77507aa24 (net/mlx4: Enable CQE/EQE stride support)
>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>> Signed-off-by: Ido Shamay <idos@mellanox.com>
>> Signed-off-by: Amir Vadai <amirv@mellanox.com>
>> ---
>> Dave Hi,
>> 
>> Please pull this patch also to stable (at least 3.17)
>> 
>> Thanks,
>> Amir
> 
> Small correction: Should pull into -stable >= 3.18

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: FIXED_PHY is broken...
From: Florian Fainelli @ 2014-12-16 20:23 UTC (permalink / raw)
  To: David Miller, netdev
In-Reply-To: <20141216.151550.805808430085367492.davem@davemloft.net>

On 16/12/14 12:15, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 16 Dec 2014 14:30:27 -0500 (EST)
> 
>> From: David Miller <davem@davemloft.net>
>> Date: Tue, 16 Dec 2014 11:25:34 -0500 (EST)
>>
>>> I get this now when I run oldconfig:
>>>
>>> warning: (NET_DSA_BCM_SF2 && BCMGENET && SYSTEMPORT) selects FIXED_PHY which has unmet direct dependencies (NETDEVICES && PHYLIB=y)
>>
>> Here is how I'm going to fix this.
>>
>> FIXED_PHY needs to be allowed to be modular, and built even if PHYLIB is
>> modular too.

You beat me to it, thanks David!

>>
>> ====================
>> [PATCH] net: Allow FIXED_PHY to be modular.
> 
> Ok, it takes a little more work, the problem is that there is already
> a module named fixed.ko in the regulator layer, so we have to rename
> this to something else.
> 
> ====================
> [PATCH] net: Allow FIXED_PHY to be modular.
> 
> Otherwise we get things like:
> 
> warning: (NET_DSA_BCM_SF2 && BCMGENET && SYSTEMPORT) selects FIXED_PHY which has unmet direct dependencies (NETDEVICES && PHYLIB=y)
> 
> In order to make this work we have to rename fixed.c to fixed_phy.c
> because the regulator drivers already have a module named "fixed.o".
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>

> ---
>  drivers/net/phy/Kconfig                  | 4 ++--
>  drivers/net/phy/Makefile                 | 2 +-
>  drivers/net/phy/{fixed.c => fixed_phy.c} | 0
>  include/linux/phy_fixed.h                | 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
>  rename drivers/net/phy/{fixed.c => fixed_phy.c} (100%)
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index b4b0f80..a3c251b 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -119,8 +119,8 @@ config MICREL_PHY
>  	  Supports the KSZ9021, VSC8201, KS8001 PHYs.
>  
>  config FIXED_PHY
> -	bool "Driver for MDIO Bus/PHY emulation with fixed speed/link PHYs"
> -	depends on PHYLIB=y
> +	tristate "Driver for MDIO Bus/PHY emulation with fixed speed/link PHYs"
> +	depends on PHYLIB
>  	---help---
>  	  Adds the platform "fixed" MDIO Bus to cover the boards that use
>  	  PHYs that are not connected to the real MDIO bus.
> diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
> index eb3b18b..501ea76 100644
> --- a/drivers/net/phy/Makefile
> +++ b/drivers/net/phy/Makefile
> @@ -17,7 +17,7 @@ obj-$(CONFIG_BCM87XX_PHY)	+= bcm87xx.o
>  obj-$(CONFIG_ICPLUS_PHY)	+= icplus.o
>  obj-$(CONFIG_REALTEK_PHY)	+= realtek.o
>  obj-$(CONFIG_LSI_ET1011C_PHY)	+= et1011c.o
> -obj-$(CONFIG_FIXED_PHY)		+= fixed.o
> +obj-$(CONFIG_FIXED_PHY)		+= fixed_phy.o
>  obj-$(CONFIG_MDIO_BITBANG)	+= mdio-bitbang.o
>  obj-$(CONFIG_MDIO_GPIO)		+= mdio-gpio.o
>  obj-$(CONFIG_NATIONAL_PHY)	+= national.o
> diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed_phy.c
> similarity index 100%
> rename from drivers/net/phy/fixed.c
> rename to drivers/net/phy/fixed_phy.c
> diff --git a/include/linux/phy_fixed.h b/include/linux/phy_fixed.h
> index f2ca1b4..7e75bfe 100644
> --- a/include/linux/phy_fixed.h
> +++ b/include/linux/phy_fixed.h
> @@ -11,7 +11,7 @@ struct fixed_phy_status {
>  
>  struct device_node;
>  
> -#ifdef CONFIG_FIXED_PHY
> +#if IS_ENABLED(CONFIG_FIXED_PHY)
>  extern int fixed_phy_add(unsigned int irq, int phy_id,
>  			 struct fixed_phy_status *status);
>  extern struct phy_device *fixed_phy_register(unsigned int irq,
> 

^ permalink raw reply

* Re: [PATCH 2/2] ip_tunnel: Add missing validation of encap type to ip_tunnel_encap_setup()
From: Tom Herbert @ 2014-12-16 20:23 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David Miller, Linux Netdev List
In-Reply-To: <dd70de766b1111fdcb4794f6d671e4eca5053730.1418759998.git.tgraf@suug.ch>

On Tue, Dec 16, 2014 at 12:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> The encap->type comes straight from Netlink. Validate it against
> max supported encap types just like ip_encap_hlen() already does.
>
> Fixes: a8c5f9 ("ip_tunnel: Ops registration for secondary encap (fou, gue)")
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
>  net/ipv4/ip_tunnel.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> index 2f498f8..d3e4479 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -573,6 +573,9 @@ int ip_tunnel_encap(struct sk_buff *skb, struct ip_tunnel *t,
>         if (t->encap.type == TUNNEL_ENCAP_NONE)
>                 return 0;
>
> +       if (t->encap.type >= MAX_IPTUN_ENCAP_OPS)
> +               return -EINVAL;
> +

I don't think this is technically needed, we should have already
verified the type when setting up the tunnel (ip_encap_hlen).

>         rcu_read_lock();
>         ops = rcu_dereference(iptun_encaps[t->encap.type]);
>         if (likely(ops && ops->build_header))
> --
> 1.9.3
>

^ permalink raw reply

* Re: [PATCHv1 net] xen-netfront: use napi_complete() correctly to prevent Rx stalling
From: David Miller @ 2014-12-16 20:22 UTC (permalink / raw)
  To: david.vrabel; +Cc: netdev, xen-devel, konrad.wilk, boris.ostrovsky, edumazet
In-Reply-To: <1418756386-6940-1-git-send-email-david.vrabel@citrix.com>

From: David Vrabel <david.vrabel@citrix.com>
Date: Tue, 16 Dec 2014 18:59:46 +0000

> After d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt
> masking in NAPI) the napi instance is removed from the per-cpu list
> prior to calling the n->poll(), and is only requeued if all of the
> budget was used.  This inadvertently broke netfront because netfront
> does not use NAPI correctly.
> 
> If netfront had not used all of its budget it would do a final check
> for any Rx responses and avoid calling napi_complete() if there were
> more responses.  It would still return under budget so it would never
> be rescheduled.  The final check would also not re-enable the Rx
> interrupt.
> 
> Additionally, xenvif_poll() would also call napi_complete() /after/
> enabling the interrupt.  This resulted in a race between the
> napi_complete() and the napi_schedule() in the interrupt handler.  The
> use of local_irq_save/restore() avoided by race iff the handler is
> running on the same CPU but not if it was running on a different CPU.
> 
> Fix both of these by always calling napi_compete() if the budget was
> not all used, and then calling napi_schedule() if the final checks
> says there's more work.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 0/2] ip_tunnel fixes
From: David Miller @ 2014-12-16 20:22 UTC (permalink / raw)
  To: tgraf; +Cc: netdev, therbert
In-Reply-To: <cover.1418759998.git.tgraf@suug.ch>

From: Thomas Graf <tgraf@suug.ch>
Date: Tue, 16 Dec 2014 21:05:19 +0100

> Thomas Graf (2):
>   ip_tunnel: Add sanity checks to ip_tunnel_encap_add_ops()
>   ip_tunnel: Add missing validation of encap type to
>     ip_tunnel_encap_setup()

Both applied, thanks Thomas.

^ permalink raw reply

* Re: [PATCH net-next 2/3] Implementation of RFC 4898 Extended TCP Statistics (Web10G)
From: David Miller @ 2014-12-16 20:18 UTC (permalink / raw)
  To: rapier; +Cc: alexei.starovoitov, netdev
In-Reply-To: <54909278.6090806@psc.edu>

From: rapier <rapier@psc.edu>
Date: Tue, 16 Dec 2014 15:13:44 -0500

> On 12/16/14, 3:03 PM, David Miller wrote:
> 
>> You shouldn't need to export any symbols.
> 
> As a point of clarification - is it acceptable to export symbols for
> use with in tree modules such as tcp_htcp? We are more than willing to
> do the work required to bring this in line with best practices.

I'm saying for data and TCP statistics collection, you shouldn't need
to add any new symbol exports.

Keep this in the main kernel, nothing external should be needed.

Extending tcp_info or similar is the only reasonable way to implement
this stuff.

^ permalink raw reply

* [PATCH netfilter-next] xt_osf: Use continue to reduce indentation
From: Joe Perches @ 2014-12-16 20:17 UTC (permalink / raw)
  To: Evgeniy Polyakov
  Cc: Pablo Neira Ayuso, Patrick McHardy, Jozsef Kadlecsik,
	netfilter-devel, netdev, LKML

Invert logic in test to use continue.

This routine already uses continue, use it a bit more to
minimize > 80 column long lines and unnecessary indentation.

No change in compiled object file.

Other miscellanea:

o Remove trailing whitespace
o Realign arguments to multiline statement

Signed-off-by: Joe Perches <joe@perches.com>
---
 net/netfilter/xt_osf.c | 169 +++++++++++++++++++++++++------------------------
 1 file changed, 85 insertions(+), 84 deletions(-)

diff --git a/net/netfilter/xt_osf.c b/net/netfilter/xt_osf.c
index c529161..0778855 100644
--- a/net/netfilter/xt_osf.c
+++ b/net/netfilter/xt_osf.c
@@ -225,6 +225,8 @@ xt_osf_match_packet(const struct sk_buff *skb, struct xt_action_param *p)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(kf, &xt_osf_fingers[df], finger_entry) {
+		int foptsize, optnum;
+
 		f = &kf->finger;
 
 		if (!(info->flags & XT_OSF_LOG) && strcmp(info->genre, f->genre))
@@ -233,110 +235,109 @@ xt_osf_match_packet(const struct sk_buff *skb, struct xt_action_param *p)
 		optp = _optp;
 		fmatch = FMATCH_WRONG;
 
-		if (totlen == f->ss && xt_osf_ttl(skb, info, f->ttl)) {
-			int foptsize, optnum;
+		if (totlen != f->ss || !xt_osf_ttl(skb, info, f->ttl))
+			continue;
 
-			/*
-			 * Should not happen if userspace parser was written correctly.
-			 */
-			if (f->wss.wc >= OSF_WSS_MAX)
-				continue;
+		/*
+		 * Should not happen if userspace parser was written correctly.
+		 */
+		if (f->wss.wc >= OSF_WSS_MAX)
+			continue;
 
-			/* Check options */
+		/* Check options */
 
-			foptsize = 0;
-			for (optnum = 0; optnum < f->opt_num; ++optnum)
-				foptsize += f->opt[optnum].length;
+		foptsize = 0;
+		for (optnum = 0; optnum < f->opt_num; ++optnum)
+			foptsize += f->opt[optnum].length;
 
-			if (foptsize > MAX_IPOPTLEN ||
-				optsize > MAX_IPOPTLEN ||
-				optsize != foptsize)
-				continue;
+		if (foptsize > MAX_IPOPTLEN ||
+		    optsize > MAX_IPOPTLEN ||
+		    optsize != foptsize)
+			continue;
 
-			check_WSS = f->wss.wc;
+		check_WSS = f->wss.wc;
 
-			for (optnum = 0; optnum < f->opt_num; ++optnum) {
-				if (f->opt[optnum].kind == (*optp)) {
-					__u32 len = f->opt[optnum].length;
-					const __u8 *optend = optp + len;
-					int loop_cont = 0;
+		for (optnum = 0; optnum < f->opt_num; ++optnum) {
+			if (f->opt[optnum].kind == (*optp)) {
+				__u32 len = f->opt[optnum].length;
+				const __u8 *optend = optp + len;
+				int loop_cont = 0;
 
-					fmatch = FMATCH_OK;
+				fmatch = FMATCH_OK;
 
-					switch (*optp) {
-					case OSFOPT_MSS:
-						mss = optp[3];
-						mss <<= 8;
-						mss |= optp[2];
+				switch (*optp) {
+				case OSFOPT_MSS:
+					mss = optp[3];
+					mss <<= 8;
+					mss |= optp[2];
 
-						mss = ntohs((__force __be16)mss);
-						break;
-					case OSFOPT_TS:
-						loop_cont = 1;
-						break;
-					}
+					mss = ntohs((__force __be16)mss);
+					break;
+				case OSFOPT_TS:
+					loop_cont = 1;
+					break;
+				}
 
-					optp = optend;
-				} else
-					fmatch = FMATCH_OPT_WRONG;
+				optp = optend;
+			} else
+				fmatch = FMATCH_OPT_WRONG;
 
-				if (fmatch != FMATCH_OK)
-					break;
-			}
+			if (fmatch != FMATCH_OK)
+				break;
+		}
 
-			if (fmatch != FMATCH_OPT_WRONG) {
-				fmatch = FMATCH_WRONG;
+		if (fmatch != FMATCH_OPT_WRONG) {
+			fmatch = FMATCH_WRONG;
 
-				switch (check_WSS) {
-				case OSF_WSS_PLAIN:
-					if (f->wss.val == 0 || window == f->wss.val)
-						fmatch = FMATCH_OK;
-					break;
-				case OSF_WSS_MSS:
-					/*
-					 * Some smart modems decrease mangle MSS to 
-					 * SMART_MSS_2, so we check standard, decreased
-					 * and the one provided in the fingerprint MSS
-					 * values.
-					 */
+			switch (check_WSS) {
+			case OSF_WSS_PLAIN:
+				if (f->wss.val == 0 || window == f->wss.val)
+					fmatch = FMATCH_OK;
+				break;
+			case OSF_WSS_MSS:
+				/*
+				 * Some smart modems decrease mangle MSS to
+				 * SMART_MSS_2, so we check standard, decreased
+				 * and the one provided in the fingerprint MSS
+				 * values.
+				 */
 #define SMART_MSS_1	1460
 #define SMART_MSS_2	1448
-					if (window == f->wss.val * mss ||
-					    window == f->wss.val * SMART_MSS_1 ||
-					    window == f->wss.val * SMART_MSS_2)
-						fmatch = FMATCH_OK;
-					break;
-				case OSF_WSS_MTU:
-					if (window == f->wss.val * (mss + 40) ||
-					    window == f->wss.val * (SMART_MSS_1 + 40) ||
-					    window == f->wss.val * (SMART_MSS_2 + 40))
-						fmatch = FMATCH_OK;
-					break;
-				case OSF_WSS_MODULO:
-					if ((window % f->wss.val) == 0)
-						fmatch = FMATCH_OK;
-					break;
-				}
+				if (window == f->wss.val * mss ||
+				    window == f->wss.val * SMART_MSS_1 ||
+				    window == f->wss.val * SMART_MSS_2)
+					fmatch = FMATCH_OK;
+				break;
+			case OSF_WSS_MTU:
+				if (window == f->wss.val * (mss + 40) ||
+				    window == f->wss.val * (SMART_MSS_1 + 40) ||
+				    window == f->wss.val * (SMART_MSS_2 + 40))
+					fmatch = FMATCH_OK;
+				break;
+			case OSF_WSS_MODULO:
+				if ((window % f->wss.val) == 0)
+					fmatch = FMATCH_OK;
+				break;
 			}
+		}
 
-			if (fmatch != FMATCH_OK)
-				continue;
+		if (fmatch != FMATCH_OK)
+			continue;
 
-			fcount++;
+		fcount++;
 
-			if (info->flags & XT_OSF_LOG)
-				nf_log_packet(net, p->family, p->hooknum, skb,
-					p->in, p->out, NULL,
-					"%s [%s:%s] : %pI4:%d -> %pI4:%d hops=%d\n",
-					f->genre, f->version, f->subtype,
-					&ip->saddr, ntohs(tcp->source),
-					&ip->daddr, ntohs(tcp->dest),
-					f->ttl - ip->ttl);
+		if (info->flags & XT_OSF_LOG)
+			nf_log_packet(net, p->family, p->hooknum, skb,
+				      p->in, p->out, NULL,
+				      "%s [%s:%s] : %pI4:%d -> %pI4:%d hops=%d\n",
+				      f->genre, f->version, f->subtype,
+				      &ip->saddr, ntohs(tcp->source),
+				      &ip->daddr, ntohs(tcp->dest),
+				      f->ttl - ip->ttl);
 
-			if ((info->flags & XT_OSF_LOG) &&
-			    info->loglevel == XT_OSF_LOGLEVEL_FIRST)
-				break;
-		}
+		if ((info->flags & XT_OSF_LOG) &&
+		    info->loglevel == XT_OSF_LOGLEVEL_FIRST)
+			break;
 	}
 	rcu_read_unlock();
 

^ permalink raw reply related

* Re: pull request: wireless 2014-12-16
From: David Miller @ 2014-12-16 20:17 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20141216181613.GF19658@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Tue, 16 Dec 2014 13:16:13 -0500

> Please pull this batch of fixes intended for the 3.19 stream!

Pulled, thanks a lot John.

^ permalink raw reply

* Re: FIXED_PHY is broken...
From: David Miller @ 2014-12-16 20:15 UTC (permalink / raw)
  To: netdev; +Cc: f.fainelli
In-Reply-To: <20141216.143027.1243798311589198629.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Tue, 16 Dec 2014 14:30:27 -0500 (EST)

> From: David Miller <davem@davemloft.net>
> Date: Tue, 16 Dec 2014 11:25:34 -0500 (EST)
> 
>> I get this now when I run oldconfig:
>> 
>> warning: (NET_DSA_BCM_SF2 && BCMGENET && SYSTEMPORT) selects FIXED_PHY which has unmet direct dependencies (NETDEVICES && PHYLIB=y)
> 
> Here is how I'm going to fix this.
> 
> FIXED_PHY needs to be allowed to be modular, and built even if PHYLIB is
> modular too.
> 
> ====================
> [PATCH] net: Allow FIXED_PHY to be modular.

Ok, it takes a little more work, the problem is that there is already
a module named fixed.ko in the regulator layer, so we have to rename
this to something else.

====================
[PATCH] net: Allow FIXED_PHY to be modular.

Otherwise we get things like:

warning: (NET_DSA_BCM_SF2 && BCMGENET && SYSTEMPORT) selects FIXED_PHY which has unmet direct dependencies (NETDEVICES && PHYLIB=y)

In order to make this work we have to rename fixed.c to fixed_phy.c
because the regulator drivers already have a module named "fixed.o".

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/phy/Kconfig                  | 4 ++--
 drivers/net/phy/Makefile                 | 2 +-
 drivers/net/phy/{fixed.c => fixed_phy.c} | 0
 include/linux/phy_fixed.h                | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)
 rename drivers/net/phy/{fixed.c => fixed_phy.c} (100%)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index b4b0f80..a3c251b 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -119,8 +119,8 @@ config MICREL_PHY
 	  Supports the KSZ9021, VSC8201, KS8001 PHYs.
 
 config FIXED_PHY
-	bool "Driver for MDIO Bus/PHY emulation with fixed speed/link PHYs"
-	depends on PHYLIB=y
+	tristate "Driver for MDIO Bus/PHY emulation with fixed speed/link PHYs"
+	depends on PHYLIB
 	---help---
 	  Adds the platform "fixed" MDIO Bus to cover the boards that use
 	  PHYs that are not connected to the real MDIO bus.
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index eb3b18b..501ea76 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -17,7 +17,7 @@ obj-$(CONFIG_BCM87XX_PHY)	+= bcm87xx.o
 obj-$(CONFIG_ICPLUS_PHY)	+= icplus.o
 obj-$(CONFIG_REALTEK_PHY)	+= realtek.o
 obj-$(CONFIG_LSI_ET1011C_PHY)	+= et1011c.o
-obj-$(CONFIG_FIXED_PHY)		+= fixed.o
+obj-$(CONFIG_FIXED_PHY)		+= fixed_phy.o
 obj-$(CONFIG_MDIO_BITBANG)	+= mdio-bitbang.o
 obj-$(CONFIG_MDIO_GPIO)		+= mdio-gpio.o
 obj-$(CONFIG_NATIONAL_PHY)	+= national.o
diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed_phy.c
similarity index 100%
rename from drivers/net/phy/fixed.c
rename to drivers/net/phy/fixed_phy.c
diff --git a/include/linux/phy_fixed.h b/include/linux/phy_fixed.h
index f2ca1b4..7e75bfe 100644
--- a/include/linux/phy_fixed.h
+++ b/include/linux/phy_fixed.h
@@ -11,7 +11,7 @@ struct fixed_phy_status {
 
 struct device_node;
 
-#ifdef CONFIG_FIXED_PHY
+#if IS_ENABLED(CONFIG_FIXED_PHY)
 extern int fixed_phy_add(unsigned int irq, int phy_id,
 			 struct fixed_phy_status *status);
 extern struct phy_device *fixed_phy_register(unsigned int irq,
-- 
1.7.11.7

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox