* [PATCH v2 net-next] dsa:mv88e6xxx: allow address 0x1 in smi_init
From: Volodymyr Bendiuga @ 2017-01-03 9:49 UTC (permalink / raw)
To: andrew, vivien.didelot, f.fainelli, netdev, volodymyr.bendiuga
From: Volodymyr Bendiuga <volodymyr.bendiuga@westermo.se>
Some devices, such as the mv88e6097 do have ADDR[0] external and so it
is possible to configure the device to use SMI address 0x1. Remove the
restriction, as there are boards using this address.
Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@westermo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
drivers/net/dsa/mv88e6xxx/chip.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 4da379f..173ea97 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -4234,10 +4234,6 @@ static void mv88e6xxx_phy_destroy(struct mv88e6xxx_chip *chip)
static int mv88e6xxx_smi_init(struct mv88e6xxx_chip *chip,
struct mii_bus *bus, int sw_addr)
{
- /* ADDR[0] pin is unavailable externally and considered zero */
- if (sw_addr & 0x1)
- return -EINVAL;
-
if (sw_addr == 0)
chip->smi_ops = &mv88e6xxx_smi_single_chip_ops;
else if (mv88e6xxx_has(chip, MV88E6XXX_FLAGS_MULTI_CHIP))
--
2.7.4
^ permalink raw reply related
* [PATCH net-next] net:mv88e6xxx: use g2 interrupt for 6097 chip
From: Volodymyr Bendiuga @ 2017-01-03 10:13 UTC (permalink / raw)
To: andrew, vivien.didelot, f.fainelli, netdev, volodymyr.bendiuga
From: Volodymyr Bendiuga <volodymyr.bendiuga@westermo.se>
This chip needs MV88E6XXX_FLAG_G2_INT
Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@westermo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 13c7cc4..c7206d8 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -575,6 +575,7 @@ enum mv88e6xxx_cap {
(MV88E6XXX_FLAG_G1_ATU_FID | \
MV88E6XXX_FLAG_G1_VTU_FID | \
MV88E6XXX_FLAG_GLOBAL2 | \
+ MV88E6XXX_FLAG_G2_INT | \
MV88E6XXX_FLAG_G2_MGMT_EN_2X | \
MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
MV88E6XXX_FLAG_G2_POT | \
--
2.7.4
^ permalink raw reply related
* Re: [RFC PATCH net-next v4 1/2] macb: Add 1588 support in Cadence GEM.
From: Richard Cochran @ 2017-01-03 10:20 UTC (permalink / raw)
To: Harini Katakam
Cc: Nicolas Ferre, Rafal Ozieblo, Andrei Pistirica,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, davem@davemloft.net,
harini.katakam@xilinx.com, punnaia@xilinx.com, michals@xilinx.com,
anirudh@xilinx.com, boris.brezillon@free-electrons.com,
alexandre.belloni@free-electrons.com, tbultel@pixelsurmer.com
In-Reply-To: <CAFcVECK1vt7Hu4tgSZ2+kKpMxoT-wpikMDq65HaxRv-EMgobHA@mail.gmail.com>
On Tue, Jan 03, 2017 at 10:36:11AM +0530, Harini Katakam wrote:
> I understand that it is not accurate - it is an initial version.
No, it is not inaccurate at all, it is WRONG.
This means that time stamps will be randomly associated with PTP
network packets. To the application, the protocol will appear to
work, but the time stamp information (and thus the synchronization)
will be wrong.
To me, this is unacceptable, and I will push back on this driver
getting merged.
[ In contrast, the descriptor based approach would be ok, afaict. ]
Thanks,
Richard
^ permalink raw reply
* Re: Potential issues (security and otherwise) with the current cgroup-bpf API
From: Michal Hocko @ 2017-01-03 10:25 UTC (permalink / raw)
To: Peter Zijlstra, Tejun Heo
Cc: Andy Lutomirski, David Ahern, Alexei Starovoitov, Andy Lutomirski,
Daniel Mack, Mickaël Salaün, Kees Cook, Jann Horn,
David S. Miller, Thomas Graf, Michael Kerrisk, Linux API,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Network Development
In-Reply-To: <20161220091150.GJ3124-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
On Tue 20-12-16 10:11:50, Peter Zijlstra wrote:
> On Mon, Dec 19, 2016 at 05:56:24PM -0800, Andy Lutomirski wrote:
> > >> Huh? My example in the original email attaches a program in a
> > >> sub-hierarchy. Are you saying that 4.11 could make that example stop
> > >> working?
> > >
> > > Are you suggesting sub-cgroups should not be allowed to override the filter of a parent cgroup?
> >
> > Yes, exactly. I think there are two sensible behaviors:
> >
> > a) sub-cgroups cannot have a filter at all of the parent has a filter.
> > (This is the "punt" approach -- it lets different semantics be
> > assigned later without breaking userspace.)
> >
> > b) sub-cgroups can have a filter if a parent does, too. The semantics
> > are that the sub-cgroup filter runs first and all side-effects occur.
> > If that filter says "reject" then ancestor filters are skipped. If
> > that filter says "accept", then the ancestor filter is run and its
> > side-effects happen as well. (And so on, all the way up to the root.)
>
> So from what I understand the proposed cgroup is not in fact
> hierarchical at all.
>
> @TJ, I thought you were enforcing all new cgroups to be properly
> hierarchical, that would very much include this one.
I would be interested in that as well. We have made that mistake in
memcg v1 where hierarchy could be disabled for performance reasons and
that turned out to be major PITA in the end. Why do we want to repeat
the same mistake here?
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* Re: [RFC PATCH net-next v4 1/2] macb: Add 1588 support in Cadence GEM.
From: Richard Cochran @ 2017-01-03 10:29 UTC (permalink / raw)
To: Harini Katakam
Cc: Nicolas Ferre, Rafal Ozieblo, Andrei Pistirica,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, davem@davemloft.net,
harini.katakam@xilinx.com, punnaia@xilinx.com, michals@xilinx.com,
anirudh@xilinx.com, boris.brezillon@free-electrons.com,
alexandre.belloni@free-electrons.com, tbultel@pixelsurmer.com
In-Reply-To: <CAFcVECK1vt7Hu4tgSZ2+kKpMxoT-wpikMDq65HaxRv-EMgobHA@mail.gmail.com>
On Tue, Jan 03, 2017 at 10:36:11AM +0530, Harini Katakam wrote:
> I understand that it is not accurate - it is an initial version.
Why do you say, "it is an initial version?"
The Atmel device has this IP core burned in. The core is hopelessly
broken, and it cannot be fixed in SW either, so what is your point?
Thanks,
Richard
^ permalink raw reply
* RE: [RFC PATCH net-next v4 1/2] macb: Add 1588 support in Cadence GEM.
From: Rafal Ozieblo @ 2017-01-03 10:47 UTC (permalink / raw)
To: Harini Katakam, Richard Cochran
Cc: Nicolas Ferre, Andrei Pistirica, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, davem@davemloft.net,
harini.katakam@xilinx.com, punnaia@xilinx.com, michals@xilinx.com,
anirudh@xilinx.com, boris.brezillon@free-electrons.com,
alexandre.belloni@free-electrons.com, tbultel@pixelsurmer.com
In-Reply-To: <CAFcVECK1vt7Hu4tgSZ2+kKpMxoT-wpikMDq65HaxRv-EMgobHA@mail.gmail.com>
>From: Harini Katakam [mailto:harinikatakamlinux@gmail.com]
>Sent: 3 stycznia 2017 06:06
>Subject: Re: [RFC PATCH net-next v4 1/2] macb: Add 1588 support in Cadence GEM.
>
>Hi Richard,
>
>On Mon, Jan 2, 2017 at 9:43 PM, Richard Cochran <richardcochran@gmail.com> wrote:
>> On Mon, Jan 02, 2017 at 03:47:07PM +0100, Nicolas Ferre wrote:
>>> Le 02/01/2017 à 12:31, Richard Cochran a écrit :
>>> > This Cadence IP core is a complete disaster.
>>>
>>> Well, it evolved and propose several options to different SoC
>>> integrators. This is not something unusual...
>>> I suspect as well that some other network adapters have the same
>>> weakness concerning PTP timestamp in single register as the early
>>> revisions of this IP.
>>
>> It appears that this core can neither latch the time on read or write,
>> or even latch time stamps. I have worked with many different PTP HW
>> implementations, even early ones like on the ixp4xx, and it is no
>> exaggeration to say that this one is uniquely broken.
>>
>>> I suspect that Rafal tend to jump too quickly to the latest IP
>>> revisions and add more options to this series: let's not try to pour
>>> too much things into this code right now.
>>
>> Why can't you check the IP version in the driver?
>
>There is an IP revision register but it would be probably be better to rely on "caps" from the compatibility strings - to cover SoC specific implementations. Also, when this extended BD is added (with timestamp), additional words will need to be added statically which will be consistent with Andrei's CONFIG_ checks.
We can distinguish IP cores with and without PTP support by reading Design Configuration Register. But to distinguish IP cores with timestamps in buffer descriptors and which support only event registers, we can only check IP version by reading the revision ID register and base on that.
I agree with Harini, compatibility strings could be better. But we might end up with many different configuration in the future.
We could use only descriptor approach but there are many Atmel's cores on the market which support only event registers.
^ permalink raw reply
* Re: [RFC PATCH net-next v4 1/2] macb: Add 1588 support in Cadence GEM.
From: Harini Katakam @ 2017-01-03 10:48 UTC (permalink / raw)
To: Richard Cochran
Cc: tbultel@pixelsurmer.com, Rafal Ozieblo,
boris.brezillon@free-electrons.com, netdev@vger.kernel.org,
alexandre.belloni@free-electrons.com, Nicolas Ferre,
linux-kernel@vger.kernel.org, Andrei Pistirica,
michals@xilinx.com, anirudh@xilinx.com, punnaia@xilinx.com,
harini.katakam@xilinx.com, davem@davemloft.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <20170103102958.GC24780@localhost.localdomain>
Hi Richard,
On Tue, Jan 3, 2017 at 3:59 PM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Tue, Jan 03, 2017 at 10:36:11AM +0530, Harini Katakam wrote:
>> I understand that it is not accurate - it is an initial version.
>
> Why do you say, "it is an initial version?"
>
> The Atmel device has this IP core burned in. The core is hopelessly
> broken, and it cannot be fixed in SW either, so what is your point?
>
I'm sorry - I just meant that this was before many necessary
enhancements and fixes.
Newer SoCs including ZynqMP (for which the original series was sent)
have the descriptor based approach which is reliable.
Regards,
Harini
^ permalink raw reply
* [PATCH net-next] ipmr, ip6mr: add RTNH_F_UNRESOLVED flag to unresolved cache entries
From: Nikolay Aleksandrov @ 2017-01-03 11:13 UTC (permalink / raw)
To: netdev; +Cc: roopa, davem, Nikolay Aleksandrov
While working with ipmr, we noticed that it is impossible to determine
if an entry is actually unresolved or its IIF interface has disappeared
(e.g. virtual interface got deleted). These entries look almost
identical to user-space when dumping or receiving notifications. So in
order to recognize them add a new RTNH_F_UNRESOLVED flag which is set when
sending an unresolved cache entry to user-space.
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
include/uapi/linux/rtnetlink.h | 1 +
net/ipv4/ipmr.c | 4 +++-
net/ipv6/ip6mr.c | 4 +++-
3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index e14377f2ec27..8c93ad1ef9ab 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -350,6 +350,7 @@ struct rtnexthop {
#define RTNH_F_ONLINK 4 /* Gateway is forced on link */
#define RTNH_F_OFFLOAD 8 /* offloaded route */
#define RTNH_F_LINKDOWN 16 /* carrier-down on nexthop */
+#define RTNH_F_UNRESOLVED 32 /* The entry is unresolved (ipmr) */
#define RTNH_COMPARE_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN | RTNH_F_OFFLOAD)
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index efc1e76d4977..b35dda57586b 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2091,8 +2091,10 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
int ct;
/* If cache is unresolved, don't try to parse IIF and OIF */
- if (c->mfc_parent >= MAXVIFS)
+ if (c->mfc_parent >= MAXVIFS) {
+ rtm->rtm_flags |= RTNH_F_UNRESOLVED;
return -ENOENT;
+ }
if (VIF_EXISTS(mrt, c->mfc_parent) &&
nla_put_u32(skb, RTA_IIF, mrt->vif_table[c->mfc_parent].dev->ifindex) < 0)
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 604d8953c775..e275077e8af2 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2243,8 +2243,10 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, struct sk_buff *skb,
int ct;
/* If cache is unresolved, don't try to parse IIF and OIF */
- if (c->mf6c_parent >= MAXMIFS)
+ if (c->mf6c_parent >= MAXMIFS) {
+ rtm->rtm_flags |= RTNH_F_UNRESOLVED;
return -ENOENT;
+ }
if (MIF_EXISTS(mrt, c->mf6c_parent) &&
nla_put_u32(skb, RTA_IIF, mrt->vif6_table[c->mf6c_parent].dev->ifindex) < 0)
--
2.1.4
^ permalink raw reply related
* Re: [RFC PATCH net-next v4 1/2] macb: Add 1588 support in Cadence GEM.
From: Richard Cochran @ 2017-01-03 11:14 UTC (permalink / raw)
To: Rafal Ozieblo
Cc: Harini Katakam, Nicolas Ferre, Andrei Pistirica,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, davem@davemloft.net,
harini.katakam@xilinx.com, punnaia@xilinx.com, michals@xilinx.com,
anirudh@xilinx.com, boris.brezillon@free-electrons.com,
alexandre.belloni@free-electrons.com, tbultel@pixelsurmer.com
In-Reply-To: <BN3PR07MB25168573DD82F1681CDC8437C96E0@BN3PR07MB2516.namprd07.prod.outlook.com>
On Tue, Jan 03, 2017 at 10:47:56AM +0000, Rafal Ozieblo wrote:
> We could use only descriptor approach but there are many Atmel's cores on the market which support only event registers.
As I said in my other reply in this thread, the Atmel cores cannot
possibly be made to work correctly.
Sad, but true.
Thanks,
Richard
^ permalink raw reply
* Re: [PATCH net-next] net/sched: cls_flower: Add user specified data
From: Jamal Hadi Salim @ 2017-01-03 11:44 UTC (permalink / raw)
To: John Fastabend, Paul Blakey, David S. Miller, netdev
Cc: Jiri Pirko, Hadar Hen Zion, Or Gerlitz, Roi Dayan, Roman Mashak,
Simon Horman
In-Reply-To: <586B29A4.2060408@gmail.com>
On 17-01-02 11:33 PM, John Fastabend wrote:
> On 17-01-02 05:22 PM, Jamal Hadi Salim wrote:
[..]
>> Like all cookie semantics it is for storing state. The receiver (kernel)
>> is not just store it and not intepret it. The user when reading it back
>> simplifies what they have to do for their processing.
>>
>>>
>>> The tuple <ifindex:qdisc:prio:handle> really should be unique why
>>> not use this for system wide mappings?
>>>
>>
>> I think on a single machine should be enough, however:
>> typically the user wants to define the value in a manner that
>> in a distributed system it is unique. It would be trickier to
>> do so with well defined values such as above.
>>
>
> Just extend the tuple <hostname:ifindex:qdisc:prio:handle> that
> should be unique in the domain of hostname's, or use some other domain
> wide machine identifier.
>
May work for the case of filter identification. The nice thing for
allowing cookies is you can let the user define it define their
own scheme.
> Although actions can be shared so the cookie can be shared across
> filters. Maybe its useful but it doesn't uniquely identify a filter
> in the shared case but the user would have to specify that case
> so maybe its not important.
>
Note: the action cookies and filter cookies are unrelated/orthogonal.
Their basic concept of stashing something in the cookie to help improve
what user space does (in our case millions of actions of which some are
used for accounting) is similar.
I have no objections to the flow cookies; my main concern was it should
be applicable to all classifiers not just flower. And the arbitrary size
of the cookie that you pointed out is questionable.
cheers,
jamal
^ permalink raw reply
* Re: [PATCH iproute2 net-next] tc: flower: support matching flags
From: Paul Blakey @ 2017-01-03 11:54 UTC (permalink / raw)
To: Jiri Benc
Cc: paulb, netdev, Stephen Hemminger, David S. Miller, Hadar Hen Zion,
Or Gerlitz, Roi Dayan
In-Reply-To: <20170102195522.7488179b@griffin>
On 02/01/2017 20:55, Jiri Benc wrote:
> On Wed, 28 Dec 2016 15:06:49 +0200, Paul Blakey wrote:
>> Enhance flower to support matching on flags.
>>
>> The 1st flag allows to match on whether the packet is
>> an IP fragment.
>>
>> Example:
>>
>> # add a flower filter that will drop fragmented packets
>> # (bit 0 of control flags)
>> tc filter add dev ens4f0 protocol ip parent ffff: \
>> flower \
>> src_mac e4:1d:2d:fd:8b:01 \
>> dst_mac e4:1d:2d:fd:8b:02 \
>> indev ens4f0 \
>> matching_flags 0x1/0x1 \
>> action drop
> This is very poor API. First, how is the user supposed to know what
> those magic values in "matching_flags" mean? At the very least, it
> should be documented in the man page.
>
> Second, why "matching_flags"? That name suggests that those modify the
> way the matching is done (to illustrate my point, I'd expect things
> like "if the packet is too short, match this rule anyway" to be a
> "matching flag"). But this is not the case. What's wrong with plain
> "flags"? Or, if you want to be more specific, perhaps packet_flags?
>
> Third, all of this looks very wrong anyway. There should be separate
> keywords for individual flags. In this case, there should be an
> "ip_fragment" flag. The tc tool should be responsible for putting the
> flags together and creating the appropriate mask. The example would
> then be:
>
> tc filter add dev ens4f0 protocol ip parent ffff: \
> flower \
> src_mac e4:1d:2d:fd:8b:01 \
> dst_mac e4:1d:2d:fd:8b:02 \
> indev ens4f0 \
> ip_fragment yes\
> action drop
>
> I don't care whether it's "ip_fragment yes/no", "ip_fragment 1/0",
> "ip_fragment/noip_fragment" or similar. The important thing is it's a
> boolean flag; if specified, it's set to 0/1 and unmasked, if not
> specified, it's wildcarded.
>
> Stephen, I understand that you already applied this patch but given how
> horrible the proposed API is and that's even undocumented in this
> patch, please reconsider this. If this is released, the API is set in
> stone and, frankly, it's very user unfriendly this way.
>
> Paul, could you please prepare a patch that would introduce a more sane
> API? I'd strongly prefer what I described under "third" but should you
> strongly disagree, at least implement "second" and document the
> currently known flag values.
>
> Thanks,
>
> Jiri
Matching name was from the idea that we are doing is matching.
And regarding documentation/flag names I didn't want tc tool to be need
of a update each time a new flag is introduced,
But I guess I can add two options like with ip_proto where you can
specify known flags by name but can also give a value.
What do you think about that?
flags <FLAGS> / <HEX'/'HEX>
FLAGS => frag/no_frag/tcp_syn/no_tcp_syn ['|'<FLAGS>]*
e.g: flags frag|no_tcp_syn or flags 0x01/0x15
and the mask will have a on bits corresponds only to those flags specified.
^ permalink raw reply
* Re: [PATCH net-next] bridge: multicast to unicast
From: Nikolay Aleksandrov via Bridge @ 2017-01-03 11:58 UTC (permalink / raw)
To: Linus Lüssing, netdev
Cc: bridge, linux-wireless, linux-kernel, David S . Miller,
Felix Fietkau
In-Reply-To: <20170102193214.31723-1-linus.luessing@c0d3.blue>
On 02/01/17 20:32, Linus Lüssing wrote:
> Implements an optional, per bridge port flag and feature to deliver
> multicast packets to any host on the according port via unicast
> individually. This is done by copying the packet per host and
> changing the multicast destination MAC to a unicast one accordingly.
>
> multicast-to-unicast works on top of the multicast snooping feature of
> the bridge. Which means unicast copies are only delivered to hosts which
> are interested in it and signalized this via IGMP/MLD reports
> previously.
>
> This feature is intended for interface types which have a more reliable
> and/or efficient way to deliver unicast packets than broadcast ones
> (e.g. wifi).
>
> However, it should only be enabled on interfaces where no IGMPv2/MLDv1
> report suppression takes place. This feature is disabled by default.
>
> The initial patch and idea is from Felix Fietkau.
>
> Cc: Felix Fietkau <nbd@nbd.name>
> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
>
> ---
>
Hi Linus,
A few comments below, in general I have 2 concerns: the new mcast fast-path
tests and cache line ref, and adding netlink support for the new flag.
> This feature is used and enabled by default in OpenWRT and LEDE for AP
> interfaces for more than a year now to allow both a more robust multicast
> delivery and multicast at higher rates (e.g. multicast streaming).
>
> In OpenWRT/LEDE the IGMP/MLD report suppression issue is overcome by
> the network daemon enabling AP isolation and by that separating all STAs.
> Delivery of STA-to-STA IP mulitcast is made possible again by
> enabling and utilizing the bridge hairpin mode, which considers the
> incoming port as a potential outgoing port, too.
>
> Hairpin-mode is performed after multicast snooping, therefore leading to
> only deliver reports to STAs running a multicast router.
> ---
> include/linux/if_bridge.h | 1 +
> net/bridge/br_forward.c | 44 +++++++++++++++++++++--
> net/bridge/br_mdb.c | 2 +-
> net/bridge/br_multicast.c | 92 ++++++++++++++++++++++++++++++++++-------------
> net/bridge/br_private.h | 4 ++-
> net/bridge/br_sysfs_if.c | 2 ++
> 6 files changed, 115 insertions(+), 30 deletions(-)
>
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index c6587c0..f1b0d78 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -46,6 +46,7 @@ struct br_ip_list {
> #define BR_LEARNING_SYNC BIT(9)
> #define BR_PROXYARP_WIFI BIT(10)
> #define BR_MCAST_FLOOD BIT(11)
> +#define BR_MULTICAST_TO_UCAST BIT(12)
>
> #define BR_DEFAULT_AGEING_TIME (300 * HZ)
>
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index 7cb41ae..49d742d 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -174,6 +174,33 @@ static struct net_bridge_port *maybe_deliver(
> return p;
> }
>
> +static struct net_bridge_port *maybe_deliver_addr(
> + struct net_bridge_port *prev, struct net_bridge_port *p,
> + struct sk_buff *skb, const unsigned char *addr,
> + bool local_orig)
> +{
> + struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
> + const unsigned char *src = eth_hdr(skb)->h_source;
> +
> + if (!should_deliver(p, skb))
> + return prev;
> +
> + /* Even with hairpin, no soliloquies - prevent breaking IPv6 DAD */
> + if (skb->dev == p->dev && ether_addr_equal(src, addr))
> + return prev;
> +
> + skb = skb_copy(skb, GFP_ATOMIC);
> + if (!skb) {
> + dev->stats.tx_dropped++;
> + return prev;
> + }
> +
> + memcpy(eth_hdr(skb)->h_dest, addr, ETH_ALEN);
> + __br_forward(p, skb, local_orig);
> +
> + return prev;
> +}
> +
> /* called under rcu_read_lock */
> void br_flood(struct net_bridge *br, struct sk_buff *skb,
> enum br_pkt_type pkt_type, bool local_rcv, bool local_orig)
> @@ -231,6 +258,7 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
> struct net_bridge_port *prev = NULL;
> struct net_bridge_port_group *p;
> struct hlist_node *rp;
> + const unsigned char *addr;
nit: please arrange these into reverse christmas tree
>
> rp = rcu_dereference(hlist_first_rcu(&br->router_list));
> p = mdst ? rcu_dereference(mdst->ports) : NULL;
> @@ -241,10 +269,20 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
> rport = rp ? hlist_entry(rp, struct net_bridge_port, rlist) :
> NULL;
>
> - port = (unsigned long)lport > (unsigned long)rport ?
> - lport : rport;
> + if ((unsigned long)lport > (unsigned long)rport) {
> + port = lport;
> + addr = p->unicast ? p->eth_addr : NULL;
> + } else {
> + port = rport;
> + addr = NULL;
> + }
> +
> + if (addr)
> + prev = maybe_deliver_addr(prev, port, skb, addr,
> + local_orig);
> + else
> + prev = maybe_deliver(prev, port, skb, local_orig);
This hunk adds 2 new tests and an additional cache line ref to all mcast forwarding,
regardless if the new (special case) flag is set or not.
Also are you intentionally sending the original skb through the last port ?
>
> - prev = maybe_deliver(prev, port, skb, local_orig);
> if (IS_ERR(prev))
> goto out;
> if (prev == port)
> diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
> index 7dbc80d..056e6ac 100644
> --- a/net/bridge/br_mdb.c
> +++ b/net/bridge/br_mdb.c
> @@ -531,7 +531,7 @@ static int br_mdb_add_group(struct net_bridge *br, struct net_bridge_port *port,
> break;
> }
>
> - p = br_multicast_new_port_group(port, group, *pp, state);
> + p = br_multicast_new_port_group(port, group, *pp, state, NULL);
> if (unlikely(!p))
> return -ENOMEM;
> rcu_assign_pointer(*pp, p);
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index b30e77e..470a2409 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -43,12 +43,14 @@ static void br_multicast_add_router(struct net_bridge *br,
> static void br_ip4_multicast_leave_group(struct net_bridge *br,
> struct net_bridge_port *port,
> __be32 group,
> - __u16 vid);
> + __u16 vid,
> + const unsigned char *src);
> +
> #if IS_ENABLED(CONFIG_IPV6)
> static void br_ip6_multicast_leave_group(struct net_bridge *br,
> struct net_bridge_port *port,
> const struct in6_addr *group,
> - __u16 vid);
> + __u16 vid, const unsigned char *src);
> #endif
> unsigned int br_mdb_rehash_seq;
>
> @@ -711,7 +713,8 @@ struct net_bridge_port_group *br_multicast_new_port_group(
> struct net_bridge_port *port,
> struct br_ip *group,
> struct net_bridge_port_group __rcu *next,
> - unsigned char flags)
> + unsigned char flags,
> + const unsigned char *src)
> {
> struct net_bridge_port_group *p;
>
> @@ -726,12 +729,35 @@ struct net_bridge_port_group *br_multicast_new_port_group(
> hlist_add_head(&p->mglist, &port->mglist);
> setup_timer(&p->timer, br_multicast_port_group_expired,
> (unsigned long)p);
> +
> + if ((port->flags & BR_MULTICAST_TO_UCAST) && src) {
> + memcpy(p->eth_addr, src, ETH_ALEN);
> + p->unicast = true;
> + }
> +
> return p;
> }
>
> +static bool br_port_group_equal(struct net_bridge_port_group *p,
> + struct net_bridge_port *port,
> + const unsigned char *src)
> +{
> + if (p->port != port)
> + return false;
> +
> + if (!p->unicast)
> + return true;
> +
> + if (!src)
> + return false;
> +
> + return ether_addr_equal(src, p->eth_addr);
> +}
> +
> static int br_multicast_add_group(struct net_bridge *br,
> struct net_bridge_port *port,
> - struct br_ip *group)
> + struct br_ip *group,
> + const unsigned char *src)
> {
> struct net_bridge_port_group __rcu **pp;
> struct net_bridge_port_group *p;
> @@ -758,13 +784,13 @@ static int br_multicast_add_group(struct net_bridge *br,
> for (pp = &mp->ports;
> (p = mlock_dereference(*pp, br)) != NULL;
> pp = &p->next) {
> - if (p->port == port)
> + if (br_port_group_equal(p, port, src))
> goto found;
> if ((unsigned long)p->port < (unsigned long)port)
> break;
> }
>
> - p = br_multicast_new_port_group(port, group, *pp, 0);
> + p = br_multicast_new_port_group(port, group, *pp, 0, src);
> if (unlikely(!p))
> goto err;
> rcu_assign_pointer(*pp, p);
> @@ -783,7 +809,8 @@ static int br_multicast_add_group(struct net_bridge *br,
> static int br_ip4_multicast_add_group(struct net_bridge *br,
> struct net_bridge_port *port,
> __be32 group,
> - __u16 vid)
> + __u16 vid,
> + const unsigned char *src)
> {
> struct br_ip br_group;
>
> @@ -794,14 +821,15 @@ static int br_ip4_multicast_add_group(struct net_bridge *br,
> br_group.proto = htons(ETH_P_IP);
> br_group.vid = vid;
>
> - return br_multicast_add_group(br, port, &br_group);
> + return br_multicast_add_group(br, port, &br_group, src);
> }
>
> #if IS_ENABLED(CONFIG_IPV6)
> static int br_ip6_multicast_add_group(struct net_bridge *br,
> struct net_bridge_port *port,
> const struct in6_addr *group,
> - __u16 vid)
> + __u16 vid,
> + const unsigned char *src)
> {
> struct br_ip br_group;
>
> @@ -812,7 +840,7 @@ static int br_ip6_multicast_add_group(struct net_bridge *br,
> br_group.proto = htons(ETH_P_IPV6);
> br_group.vid = vid;
>
> - return br_multicast_add_group(br, port, &br_group);
> + return br_multicast_add_group(br, port, &br_group, src);
> }
> #endif
>
> @@ -1081,6 +1109,7 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
> struct sk_buff *skb,
> u16 vid)
> {
> + const unsigned char *src;
> struct igmpv3_report *ih;
> struct igmpv3_grec *grec;
> int i;
> @@ -1121,12 +1150,14 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
> continue;
> }
>
> + src = eth_hdr(skb)->h_source;
> if ((type == IGMPV3_CHANGE_TO_INCLUDE ||
> type == IGMPV3_MODE_IS_INCLUDE) &&
> ntohs(grec->grec_nsrcs) == 0) {
> - br_ip4_multicast_leave_group(br, port, group, vid);
> + br_ip4_multicast_leave_group(br, port, group, vid, src);
> } else {
> - err = br_ip4_multicast_add_group(br, port, group, vid);
> + err = br_ip4_multicast_add_group(br, port, group, vid,
> + src);
> if (err)
> break;
> }
> @@ -1141,6 +1172,7 @@ static int br_ip6_multicast_mld2_report(struct net_bridge *br,
> struct sk_buff *skb,
> u16 vid)
> {
> + const unsigned char *src = eth_hdr(skb)->h_source;
> struct icmp6hdr *icmp6h;
> struct mld2_grec *grec;
> int i;
> @@ -1192,10 +1224,11 @@ static int br_ip6_multicast_mld2_report(struct net_bridge *br,
> grec->grec_type == MLD2_MODE_IS_INCLUDE) &&
> ntohs(*nsrcs) == 0) {
> br_ip6_multicast_leave_group(br, port, &grec->grec_mca,
> - vid);
> + vid, src);
> } else {
> err = br_ip6_multicast_add_group(br, port,
> - &grec->grec_mca, vid);
> + &grec->grec_mca, vid,
> + src);
> if (err)
> break;
> }
> @@ -1511,7 +1544,8 @@ br_multicast_leave_group(struct net_bridge *br,
> struct net_bridge_port *port,
> struct br_ip *group,
> struct bridge_mcast_other_query *other_query,
> - struct bridge_mcast_own_query *own_query)
> + struct bridge_mcast_own_query *own_query,
> + const unsigned char *src)
> {
> struct net_bridge_mdb_htable *mdb;
> struct net_bridge_mdb_entry *mp;
> @@ -1535,7 +1569,7 @@ br_multicast_leave_group(struct net_bridge *br,
> for (pp = &mp->ports;
> (p = mlock_dereference(*pp, br)) != NULL;
> pp = &p->next) {
> - if (p->port != port)
> + if (!br_port_group_equal(p, port, src))
> continue;
>
> rcu_assign_pointer(*pp, p->next);
> @@ -1566,7 +1600,7 @@ br_multicast_leave_group(struct net_bridge *br,
> for (p = mlock_dereference(mp->ports, br);
> p != NULL;
> p = mlock_dereference(p->next, br)) {
> - if (p->port != port)
> + if (!br_port_group_equal(p, port, src))
> continue;
>
> if (!hlist_unhashed(&p->mglist) &&
> @@ -1617,7 +1651,8 @@ br_multicast_leave_group(struct net_bridge *br,
> static void br_ip4_multicast_leave_group(struct net_bridge *br,
> struct net_bridge_port *port,
> __be32 group,
> - __u16 vid)
> + __u16 vid,
> + const unsigned char *src)
> {
> struct br_ip br_group;
> struct bridge_mcast_own_query *own_query;
> @@ -1632,14 +1667,15 @@ static void br_ip4_multicast_leave_group(struct net_bridge *br,
> br_group.vid = vid;
>
> br_multicast_leave_group(br, port, &br_group, &br->ip4_other_query,
> - own_query);
> + own_query, src);
> }
>
> #if IS_ENABLED(CONFIG_IPV6)
> static void br_ip6_multicast_leave_group(struct net_bridge *br,
> struct net_bridge_port *port,
> const struct in6_addr *group,
> - __u16 vid)
> + __u16 vid,
> + const unsigned char *src)
> {
> struct br_ip br_group;
> struct bridge_mcast_own_query *own_query;
> @@ -1654,7 +1690,7 @@ static void br_ip6_multicast_leave_group(struct net_bridge *br,
> br_group.vid = vid;
>
> br_multicast_leave_group(br, port, &br_group, &br->ip6_other_query,
> - own_query);
> + own_query, src);
> }
> #endif
>
> @@ -1711,6 +1747,7 @@ static int br_multicast_ipv4_rcv(struct net_bridge *br,
> struct sk_buff *skb,
> u16 vid)
> {
> + const unsigned char *src;
nit: please arrange these in reverse christmas tree
> struct sk_buff *skb_trimmed = NULL;
> struct igmphdr *ih;
> int err;
> @@ -1731,13 +1768,14 @@ static int br_multicast_ipv4_rcv(struct net_bridge *br,
> }
>
> ih = igmp_hdr(skb);
> + src = eth_hdr(skb)->h_source;
> BR_INPUT_SKB_CB(skb)->igmp = ih->type;
>
> switch (ih->type) {
> case IGMP_HOST_MEMBERSHIP_REPORT:
> case IGMPV2_HOST_MEMBERSHIP_REPORT:
> BR_INPUT_SKB_CB(skb)->mrouters_only = 1;
> - err = br_ip4_multicast_add_group(br, port, ih->group, vid);
> + err = br_ip4_multicast_add_group(br, port, ih->group, vid, src);
> break;
> case IGMPV3_HOST_MEMBERSHIP_REPORT:
> err = br_ip4_multicast_igmp3_report(br, port, skb_trimmed, vid);
> @@ -1746,7 +1784,7 @@ static int br_multicast_ipv4_rcv(struct net_bridge *br,
> err = br_ip4_multicast_query(br, port, skb_trimmed, vid);
> break;
> case IGMP_HOST_LEAVE_MESSAGE:
> - br_ip4_multicast_leave_group(br, port, ih->group, vid);
> + br_ip4_multicast_leave_group(br, port, ih->group, vid, src);
> break;
> }
>
> @@ -1765,6 +1803,7 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
> struct sk_buff *skb,
> u16 vid)
> {
> + const unsigned char *src;
nit: same about arrangement
> struct sk_buff *skb_trimmed = NULL;
> struct mld_msg *mld;
> int err;
> @@ -1785,8 +1824,10 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
>
> switch (mld->mld_type) {
> case ICMPV6_MGM_REPORT:
> + src = eth_hdr(skb)->h_source;
> BR_INPUT_SKB_CB(skb)->mrouters_only = 1;
> - err = br_ip6_multicast_add_group(br, port, &mld->mld_mca, vid);
> + err = br_ip6_multicast_add_group(br, port, &mld->mld_mca, vid,
> + src);
> break;
> case ICMPV6_MLD2_REPORT:
> err = br_ip6_multicast_mld2_report(br, port, skb_trimmed, vid);
> @@ -1795,7 +1836,8 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
> err = br_ip6_multicast_query(br, port, skb_trimmed, vid);
> break;
> case ICMPV6_MGM_REDUCTION:
> - br_ip6_multicast_leave_group(br, port, &mld->mld_mca, vid);
> + src = eth_hdr(skb)->h_source;
> + br_ip6_multicast_leave_group(br, port, &mld->mld_mca, vid, src);
> break;
> }
>
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 8ce621e..cc55100 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -177,6 +177,8 @@ struct net_bridge_port_group {
> struct timer_list timer;
> struct br_ip addr;
> unsigned char flags;
> + unsigned char eth_addr[ETH_ALEN];
> + bool unicast;
I think you can remove the boolean unicast here and either use the "flags" or
the eth_addr itself.
This structure needs a serious re-arrangement.
> };
>
> struct net_bridge_mdb_entry
> @@ -599,7 +601,7 @@ void br_multicast_free_pg(struct rcu_head *head);
> struct net_bridge_port_group *
> br_multicast_new_port_group(struct net_bridge_port *port, struct br_ip *group,
> struct net_bridge_port_group __rcu *next,
> - unsigned char flags);
> + unsigned char flags, const unsigned char *src);
> void br_mdb_init(void);
> void br_mdb_uninit(void);
> void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
> diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
> index 8bd5696..1730278 100644
> --- a/net/bridge/br_sysfs_if.c
> +++ b/net/bridge/br_sysfs_if.c
> @@ -188,6 +188,7 @@ static BRPORT_ATTR(multicast_router, S_IRUGO | S_IWUSR, show_multicast_router,
> store_multicast_router);
>
> BRPORT_ATTR_FLAG(multicast_fast_leave, BR_MULTICAST_FAST_LEAVE);
> +BRPORT_ATTR_FLAG(multicast_to_unicast, BR_MULTICAST_TO_UCAST);
> #endif
>
> static const struct brport_attribute *brport_attrs[] = {
> @@ -214,6 +215,7 @@ static const struct brport_attribute *brport_attrs[] = {
> #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
> &brport_attr_multicast_router,
> &brport_attr_multicast_fast_leave,
> + &brport_attr_multicast_to_unicast,
> #endif
> &brport_attr_proxyarp,
> &brport_attr_proxyarp_wifi,
>
Please also add netlink support, we've been working hard at adding support for all bridge
options via netlink.
Thanks,
Nik
^ permalink raw reply
* Re: [PATCH iproute2 net-next] tc: flower: support matching flags
From: Jiri Benc @ 2017-01-03 12:05 UTC (permalink / raw)
To: Paul Blakey
Cc: netdev, Stephen Hemminger, David S. Miller, Hadar Hen Zion,
Or Gerlitz, Roi Dayan
In-Reply-To: <1ec4f4ca-08e0-84fc-34c6-b3868d756050@mellanox.com>
On Tue, 3 Jan 2017 13:54:34 +0200, Paul Blakey wrote:
> Matching name was from the idea that we are doing is matching.
But we don't have matching_src_mac etc., either, although we're
matching on those fields.
> And regarding documentation/flag names I didn't want tc tool to be need
> of a update each time a new flag is introduced,
It will be needed anyway because the whole thing would be useless
without proper documentation. So each time a new flag is added, a new
patch to the tc tool will be needed, at least with an addition to its
man page.
Please, let's focus on the *user*. The tc tool is hard to grasp for
users as it is. It's crystal clear to you but you know the kernel
internals. I'm very sure that except for the few kernel developers, no
one would understand what the "flags" field does. And even among the
kernel developers, very few would remember what the magic numeric
values mean.
If we want wider adoption of flower, we should make it as easy to use
as possible. Even when it means a bit more work for us.
> But I guess I can add two options like with ip_proto where you can
> specify known flags by name but can also give a value.
> What do you think about that?
>
> flags <FLAGS> / <HEX'/'HEX>
> FLAGS => frag/no_frag/tcp_syn/no_tcp_syn ['|'<FLAGS>]*
> e.g: flags frag|no_tcp_syn or flags 0x01/0x15
> and the mask will have a on bits corresponds only to those flags specified.
This works for me, too.
Thanks!
Jiri
^ permalink raw reply
* Re: [PATCH net-next] net/sched: cls_flower: Add user specified data
From: Paul Blakey @ 2017-01-03 12:22 UTC (permalink / raw)
To: Jamal Hadi Salim, John Fastabend, David S. Miller, netdev
Cc: paulb, Jiri Pirko, Hadar Hen Zion, Or Gerlitz, Roi Dayan,
Roman Mashak, Simon Horman
In-Reply-To: <92224e21-cd3c-26b0-d8a0-31a07268e553@mojatatu.com>
On 03/01/2017 13:44, Jamal Hadi Salim wrote:
> On 17-01-02 11:33 PM, John Fastabend wrote:
>> On 17-01-02 05:22 PM, Jamal Hadi Salim wrote:
>
> [..]
>>> Like all cookie semantics it is for storing state. The receiver
>>> (kernel)
>>> is not just store it and not intepret it. The user when reading it back
>>> simplifies what they have to do for their processing.
>>>
>>>>
>>>> The tuple <ifindex:qdisc:prio:handle> really should be unique why
>>>> not use this for system wide mappings?
>>>>
>>>
>>> I think on a single machine should be enough, however:
>>> typically the user wants to define the value in a manner that
>>> in a distributed system it is unique. It would be trickier to
>>> do so with well defined values such as above.
>>>
>>
>> Just extend the tuple <hostname:ifindex:qdisc:prio:handle> that
>> should be unique in the domain of hostname's, or use some other domain
>> wide machine identifier.
>>
>
> May work for the case of filter identification. The nice thing for
> allowing cookies is you can let the user define it define their
> own scheme.
>
>> Although actions can be shared so the cookie can be shared across
>> filters. Maybe its useful but it doesn't uniquely identify a filter
>> in the shared case but the user would have to specify that case
>> so maybe its not important.
>>
>
> Note: the action cookies and filter cookies are unrelated/orthogonal.
> Their basic concept of stashing something in the cookie to help improve
> what user space does (in our case millions of actions of which some are
> used for accounting) is similar.
> I have no objections to the flow cookies; my main concern was it should
> be applicable to all classifiers not just flower. And the arbitrary size
> of the cookie that you pointed out is questionable.
>
> cheers,
> jamal
Hi all,
Our use case is replacing OVS rules with TC filters for HW offload, and
you're are right the cookie would
have saved us the mapping from OVS rule ufid to the tc filter
handle/prio... that was generated for it.
It also was going to be used to store other info like which OVS output
port corresponds to the ifindex,
so we need 128+32 for now. It helps us with dumping the the flows back,
when we lose data on crash
or restarting the user space daemon.
HW hints is another thing that might be helpful.
Its binary blob because user/app specifc and its usage might change in
the future and its and that's why there
is some headroom with size as well.
I don't mind having it on TC level but I didn't want to intervene with
all filters/TC.
^ permalink raw reply
* [PATCH] scm: remove use CMSG{_COMPAT}_ALIGN(sizeof(struct {compat_}cmsghdr))
From: yuan linyu @ 2017-01-03 12:42 UTC (permalink / raw)
To: netdev; +Cc: David S . Miller, yuan linyu
From: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
sizeof(struct cmsghdr) and sizeof(struct compat_cmsghdr) already aligned.
remove use CMSG_ALIGN(sizeof(struct cmsghdr)) and
CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) keep code consistent.
Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
---
include/linux/socket.h | 6 +++---
net/compat.c | 14 ++++++--------
net/core/scm.c | 2 +-
net/ipv4/ip_sockglue.c | 2 +-
net/rxrpc/sendmsg.c | 2 +-
5 files changed, 12 insertions(+), 14 deletions(-)
diff --git a/include/linux/socket.h b/include/linux/socket.h
index b5cc5a6..c064380 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -92,9 +92,9 @@ struct cmsghdr {
#define CMSG_ALIGN(len) ( ((len)+sizeof(long)-1) & ~(sizeof(long)-1) )
-#define CMSG_DATA(cmsg) ((void *)((char *)(cmsg) + CMSG_ALIGN(sizeof(struct cmsghdr))))
-#define CMSG_SPACE(len) (CMSG_ALIGN(sizeof(struct cmsghdr)) + CMSG_ALIGN(len))
-#define CMSG_LEN(len) (CMSG_ALIGN(sizeof(struct cmsghdr)) + (len))
+#define CMSG_DATA(cmsg) ((void *)((char *)(cmsg) + sizeof(struct cmsghdr)))
+#define CMSG_SPACE(len) (sizeof(struct cmsghdr) + CMSG_ALIGN(len))
+#define CMSG_LEN(len) (sizeof(struct cmsghdr) + (len))
#define __CMSG_FIRSTHDR(ctl,len) ((len) >= sizeof(struct cmsghdr) ? \
(struct cmsghdr *)(ctl) : \
diff --git a/net/compat.c b/net/compat.c
index 96c544b..4e27dd1 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -90,11 +90,11 @@ int get_compat_msghdr(struct msghdr *kmsg,
#define CMSG_COMPAT_ALIGN(len) ALIGN((len), sizeof(s32))
#define CMSG_COMPAT_DATA(cmsg) \
- ((void __user *)((char __user *)(cmsg) + CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr))))
+ ((void __user *)((char __user *)(cmsg) + sizeof(struct compat_cmsghdr)))
#define CMSG_COMPAT_SPACE(len) \
- (CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) + CMSG_COMPAT_ALIGN(len))
+ (sizeof(struct compat_cmsghdr) + CMSG_COMPAT_ALIGN(len))
#define CMSG_COMPAT_LEN(len) \
- (CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) + (len))
+ (sizeof(struct compat_cmsghdr) + (len))
#define CMSG_COMPAT_FIRSTHDR(msg) \
(((msg)->msg_controllen) >= sizeof(struct compat_cmsghdr) ? \
@@ -141,8 +141,7 @@ int cmsghdr_from_user_compat_to_kern(struct msghdr *kmsg, struct sock *sk,
if (!CMSG_COMPAT_OK(ucmlen, ucmsg, kmsg))
return -EINVAL;
- tmp = ((ucmlen - CMSG_COMPAT_ALIGN(sizeof(*ucmsg))) +
- CMSG_ALIGN(sizeof(struct cmsghdr)));
+ tmp = ((ucmlen - sizeof(*ucmsg)) + sizeof(struct cmsghdr));
tmp = CMSG_ALIGN(tmp);
kcmlen += tmp;
ucmsg = cmsg_compat_nxthdr(kmsg, ucmsg, ucmlen);
@@ -168,8 +167,7 @@ int cmsghdr_from_user_compat_to_kern(struct msghdr *kmsg, struct sock *sk,
goto Efault;
if (!CMSG_COMPAT_OK(ucmlen, ucmsg, kmsg))
goto Einval;
- tmp = ((ucmlen - CMSG_COMPAT_ALIGN(sizeof(*ucmsg))) +
- CMSG_ALIGN(sizeof(struct cmsghdr)));
+ tmp = ((ucmlen - sizeof(*ucmsg)) + sizeof(struct cmsghdr));
if ((char *)kcmsg_base + kcmlen - (char *)kcmsg < CMSG_ALIGN(tmp))
goto Einval;
kcmsg->cmsg_len = tmp;
@@ -178,7 +176,7 @@ int cmsghdr_from_user_compat_to_kern(struct msghdr *kmsg, struct sock *sk,
__get_user(kcmsg->cmsg_type, &ucmsg->cmsg_type) ||
copy_from_user(CMSG_DATA(kcmsg),
CMSG_COMPAT_DATA(ucmsg),
- (ucmlen - CMSG_COMPAT_ALIGN(sizeof(*ucmsg)))))
+ (ucmlen - sizeof(*ucmsg))))
goto Efault;
/* Advance. */
diff --git a/net/core/scm.c b/net/core/scm.c
index d882043..b6d8368 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -71,7 +71,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp)
struct file **fpp;
int i, num;
- num = (cmsg->cmsg_len - CMSG_ALIGN(sizeof(struct cmsghdr)))/sizeof(int);
+ num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int);
if (num <= 0)
return 0;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 53ae0c6..c77e65e 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -272,7 +272,7 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
continue;
switch (cmsg->cmsg_type) {
case IP_RETOPTS:
- err = cmsg->cmsg_len - CMSG_ALIGN(sizeof(struct cmsghdr));
+ err = cmsg->cmsg_len - sizeof(struct cmsghdr);
/* Our caller is responsible for freeing ipc->opt */
err = ip_options_get(net, &ipc->opt, CMSG_DATA(cmsg),
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index b214a4d..0a6ef21 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -376,7 +376,7 @@ static int rxrpc_sendmsg_cmsg(struct msghdr *msg,
if (!CMSG_OK(msg, cmsg))
return -EINVAL;
- len = cmsg->cmsg_len - CMSG_ALIGN(sizeof(struct cmsghdr));
+ len = cmsg->cmsg_len - sizeof(struct cmsghdr);
_debug("CMSG %d, %d, %d",
cmsg->cmsg_level, cmsg->cmsg_type, len);
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net-next V2 1/3] vhost: better detection of available buffers
From: Stefan Hajnoczi @ 2017-01-03 13:08 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <1482912571-3157-2-git-send-email-jasowang@redhat.com>
[-- Attachment #1.1: Type: text/plain, Size: 719 bytes --]
On Wed, Dec 28, 2016 at 04:09:29PM +0800, Jason Wang wrote:
> This patch tries to do several tweaks on vhost_vq_avail_empty() for a
> better performance:
>
> - check cached avail index first which could avoid userspace memory access.
> - using unlikely() for the failure of userspace access
> - check vq->last_avail_idx instead of cached avail index as the last
> step.
>
> This patch is need for batching supports which needs to peek whether
> or not there's still available buffers in the ring.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/vhost/vhost.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH] drop_monitor: consider inserted data in genlmsg_end
From: Neil Horman @ 2017-01-03 13:09 UTC (permalink / raw)
To: Reiter Wolfgang; +Cc: davem, netdev, linux-kernel
In-Reply-To: <20170103003910.8984-1-wr0112358@gmail.com>
On Tue, Jan 03, 2017 at 01:39:10AM +0100, Reiter Wolfgang wrote:
> Final nlmsg_len field update must reflect inserted net_dm_drop_point
> data.
>
> This patch depends on previous patch:
> "drop_monitor: add missing call to genlmsg_end"
>
> Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
> ---
> net/core/drop_monitor.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
> index f465bad..fb55327 100644
> --- a/net/core/drop_monitor.c
> +++ b/net/core/drop_monitor.c
> @@ -102,7 +102,6 @@ static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
> }
> msg = nla_data(nla);
> memset(msg, 0, al);
> - genlmsg_end(skb, msg_header);
> goto out;
>
> err:
> @@ -112,6 +111,13 @@ static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
> swap(data->skb, skb);
> spin_unlock_irqrestore(&data->lock, flags);
>
> + if (skb) {
> + struct nlmsghdr *nlh = (struct nlmsghdr *)skb->data;
> + struct genlmsghdr *gnlh = (struct genlmsghdr *)nlmsg_data(nlh);
> +
> + genlmsg_end(skb, genlmsg_data(gnlh));
> + }
> +
> return skb;
> }
>
> --
> 2.9.3
>
>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
^ permalink raw reply
* Re: [PATCH net-next] bridge: multicast to unicast
From: Felix Fietkau @ 2017-01-03 13:15 UTC (permalink / raw)
To: Linus Lüssing, netdev
Cc: David S . Miller, Stephen Hemminger, bridge, linux-kernel,
linux-wireless
In-Reply-To: <20170102193214.31723-1-linus.luessing@c0d3.blue>
On 2017-01-02 20:32, Linus Lüssing wrote:
> Implements an optional, per bridge port flag and feature to deliver
> multicast packets to any host on the according port via unicast
> individually. This is done by copying the packet per host and
> changing the multicast destination MAC to a unicast one accordingly.
>
> multicast-to-unicast works on top of the multicast snooping feature of
> the bridge. Which means unicast copies are only delivered to hosts which
> are interested in it and signalized this via IGMP/MLD reports
> previously.
>
> This feature is intended for interface types which have a more reliable
> and/or efficient way to deliver unicast packets than broadcast ones
> (e.g. wifi).
>
> However, it should only be enabled on interfaces where no IGMPv2/MLDv1
> report suppression takes place. This feature is disabled by default.
>
> The initial patch and idea is from Felix Fietkau.
>
> Cc: Felix Fietkau <nbd@nbd.name>
> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Please add Signed-off-by: Felix Fietkau <nbd@nbd.name>
in the next version, and maybe also From:
Thanks,
- Felix
^ permalink raw reply
* Re: [PATCH net-next V2 2/3] vhost_net: tx batching
From: Stefan Hajnoczi @ 2017-01-03 13:16 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <1482912571-3157-3-git-send-email-jasowang@redhat.com>
[-- Attachment #1.1: Type: text/plain, Size: 519 bytes --]
On Wed, Dec 28, 2016 at 04:09:30PM +0800, Jason Wang wrote:
> This patch tries to utilize tuntap rx batching by peeking the tx
> virtqueue during transmission, if there's more available buffers in
> the virtqueue, set MSG_MORE flag for a hint for backend (e.g tuntap)
> to batch the packets.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/vhost/net.c | 23 ++++++++++++++++++++---
> 1 file changed, 20 insertions(+), 3 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable
From: Mark Lord @ 2017-01-03 13:19 UTC (permalink / raw)
To: Ansis Atteka, Hayes Wang
Cc: David Miller, greg@kroah.com, romieu@fr.zoreil.com,
netdev@vger.kernel.org, nic_swsd, linux-kernel@vger.kernel.org,
linux-usb@vger.kernel.org, Ansis Atteka
In-Reply-To: <CAA=3Oqn1i0cFDw1yi=vWJuU=w-N+8T7i3+xx9qQtrQCatAi+8Q@mail.gmail.com>
On 17-01-02 07:40 PM, Ansis Atteka wrote:
..
> I think that I am getting closer to the root cause of this bug. Also,
> I have a workaround that at least makes r8152 functionally stable in
> my Dell TB15 dock. Mark, would you mind giving a chance to the patch
> that I have in the bottom of this email to see if it helps your issue
> too (you might have to tweak those settings slightly differently if
> you use something else than USB 3.0)
/* USB_RX_EARLY_TIMEOUT */
-#define COALESCE_SUPER 85000U
-#define COALESCE_HIGH 250000U
-#define COALESCE_SLOW 524280U
+#define COALESCE_SUPER 8500U
+#define COALESCE_HIGH 25000U
+#define COALESCE_SLOW 52428U
The RTL_VER_02 chip that I was using does not support interrupt coalescing
in the driver [see the rtl8152_set_coalesce() function]. So that workaround
would not help here.
--
Mark Lord
Real-Time Remedies Inc.
mlord@pobox.com
^ permalink raw reply
* Re: [PATCH v3] net: ethernet: faraday: To support device tree usage.
From: Arnd Bergmann @ 2017-01-03 13:24 UTC (permalink / raw)
To: Greentime Hu
Cc: Florian Fainelli, netdev, devicetree, Andrew Lunn, linux-kernel,
Jiri Pirko
In-Reply-To: <CAEbi=3cJtHr-G+CHzAMgsfoscj6Eb=YUJeXB7=AmmT5DrHOqXg@mail.gmail.com>
On Tuesday, January 3, 2017 2:05:47 PM CET Greentime Hu wrote:
> I am not sure if atmac and moxa-art are exactly hardware compatible though
> they are based on faraday ftmac.
> It may be better if we use 2 different device tree binding documents to
> describe for these 2 different drivers to use.
They are probably slightly different, but close enough to have the same
binding document, as there is no technical reason to have two separate
drivers for them. The binding should be about the hardware type, not the
way that Linux currently implements the drivers.
Arnd
^ permalink raw reply
* Re: [PATCH net-next V2 3/3] tun: rx batching
From: Stefan Hajnoczi @ 2017-01-03 13:33 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <1482912571-3157-4-git-send-email-jasowang@redhat.com>
[-- Attachment #1.1: Type: text/plain, Size: 492 bytes --]
On Wed, Dec 28, 2016 at 04:09:31PM +0800, Jason Wang wrote:
> +static int tun_rx_batched(struct tun_file *tfile, struct sk_buff *skb,
> + int more)
> +{
> + struct sk_buff_head *queue = &tfile->sk.sk_write_queue;
> + struct sk_buff_head process_queue;
> + int qlen;
> + bool rcv = false;
> +
> + spin_lock(&queue->lock);
Should this be spin_lock_bh()? Below and in tun_get_user() there are
explicit local_bh_disable() calls so I guess BHs can interrupt us here
and this would deadlock.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC PATCH net-next v4 1/2] macb: Add 1588 support in Cadence GEM.
From: Nicolas Ferre @ 2017-01-03 14:22 UTC (permalink / raw)
To: Rafal Ozieblo, Harini Katakam, Richard Cochran
Cc: tbultel@pixelsurmer.com, boris.brezillon@free-electrons.com,
netdev@vger.kernel.org, alexandre.belloni@free-electrons.com,
linux-kernel@vger.kernel.org, Andrei Pistirica,
michals@xilinx.com, anirudh@xilinx.com, punnaia@xilinx.com,
harini.katakam@xilinx.com, davem@davemloft.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <BN3PR07MB25168573DD82F1681CDC8437C96E0@BN3PR07MB2516.namprd07.prod.outlook.com>
Le 03/01/2017 à 11:47, Rafal Ozieblo a écrit :
>> From: Harini Katakam [mailto:harinikatakamlinux@gmail.com]
>> Sent: 3 stycznia 2017 06:06
>> Subject: Re: [RFC PATCH net-next v4 1/2] macb: Add 1588 support in Cadence GEM.
>>
>> Hi Richard,
>>
>> On Mon, Jan 2, 2017 at 9:43 PM, Richard Cochran <richardcochran@gmail.com> wrote:
>>> On Mon, Jan 02, 2017 at 03:47:07PM +0100, Nicolas Ferre wrote:
>>>> Le 02/01/2017 à 12:31, Richard Cochran a écrit :
>>>>> This Cadence IP core is a complete disaster.
>>>>
>>>> Well, it evolved and propose several options to different SoC
>>>> integrators. This is not something unusual...
>>>> I suspect as well that some other network adapters have the same
>>>> weakness concerning PTP timestamp in single register as the early
>>>> revisions of this IP.
>>>
>>> It appears that this core can neither latch the time on read or write,
>>> or even latch time stamps. I have worked with many different PTP HW
>>> implementations, even early ones like on the ixp4xx, and it is no
>>> exaggeration to say that this one is uniquely broken.
>>>
>>>> I suspect that Rafal tend to jump too quickly to the latest IP
>>>> revisions and add more options to this series: let's not try to pour
>>>> too much things into this code right now.
>>>
>>> Why can't you check the IP version in the driver?
>>
>> There is an IP revision register but it would be probably be
>> better to rely on "caps" from the compatibility strings - to cover SoC specific
>> implementations. Also, when this extended BD is added (with timestamp),
>> additional words will need to be added statically which will be
>> consistent with Andrei's CONFIG_ checks.
> We can distinguish IP cores with and without PTP support by reading
> Design Configuration Register. But to distinguish IP cores with
> timestamps in buffer descriptors and which support only event
> registers, we can only check IP version by reading the revision ID
> register and base on that.
> I agree with Harini, compatibility strings could be better. But we
> might end up with many different configuration in the future.
Compatibility strings and associated configurations are cheap. It's not
a problem to have many different configurations and clearer for this
particular "composite" feature.
> We could use only descriptor approach but there are many Atmel's
> cores on the market which support only event registers.
Yes and once in silicon, it's hard to modify ;-)
Regards,
--
Nicolas Ferre
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* [PATCH v3 net-next 0/2] TPACKET_V3 TX_RING support
From: Sowmini Varadhan @ 2017-01-03 14:31 UTC (permalink / raw)
To: netdev, sowmini.varadhan; +Cc: daniel, willemb, davem
This patch series allows an application to use a single PF_PACKET
descriptor and leverage the best implementations of TX_RING
and RX_RING that exist today.
Patch 1 adds the kernel/Documentation changes for TX_RING
support and patch2 adds the associated test case in selftests.
Changes since v2: additional sanity checks for setsockopt
input for TX_RING/TPACKET_V3. Refactored psock_tpacket.c
test code to avoid code duplication from V2.
Sowmini Varadhan (2):
af_packet: TX_RING support for TPACKET_V3
tools: test case for TPACKET_V3/TX_RING support
Documentation/networking/packet_mmap.txt | 9 ++-
net/packet/af_packet.c | 39 +++++++++---
tools/testing/selftests/net/psock_tpacket.c | 91 ++++++++++++++++++++++-----
3 files changed, 111 insertions(+), 28 deletions(-)
^ permalink raw reply
* [PATCH v3 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support
From: Sowmini Varadhan @ 2017-01-03 14:31 UTC (permalink / raw)
To: netdev, sowmini.varadhan; +Cc: daniel, willemb, davem
In-Reply-To: <cover.1483452545.git.sowmini.varadhan@oracle.com>
Add a test case and sample code for (TPACKET_V3, PACKET_TX_RING)
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
v2: Added test case.
v3: refactored code to have a single walk_tx() function that handles all three
TPACKET versions.
tools/testing/selftests/net/psock_tpacket.c | 91 ++++++++++++++++++++++-----
1 files changed, 74 insertions(+), 17 deletions(-)
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
index 24adf70..4a1bc64 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -311,20 +311,33 @@ static inline void __v2_tx_user_ready(struct tpacket2_hdr *hdr)
__sync_synchronize();
}
-static inline int __v1_v2_tx_kernel_ready(void *base, int version)
+static inline int __v3_tx_kernel_ready(struct tpacket3_hdr *hdr)
+{
+ return !(hdr->tp_status & (TP_STATUS_SEND_REQUEST | TP_STATUS_SENDING));
+}
+
+static inline void __v3_tx_user_ready(struct tpacket3_hdr *hdr)
+{
+ hdr->tp_status = TP_STATUS_SEND_REQUEST;
+ __sync_synchronize();
+}
+
+static inline int __tx_kernel_ready(void *base, int version)
{
switch (version) {
case TPACKET_V1:
return __v1_tx_kernel_ready(base);
case TPACKET_V2:
return __v2_tx_kernel_ready(base);
+ case TPACKET_V3:
+ return __v3_tx_kernel_ready(base);
default:
bug_on(1);
return 0;
}
}
-static inline void __v1_v2_tx_user_ready(void *base, int version)
+static inline void __tx_user_ready(void *base, int version)
{
switch (version) {
case TPACKET_V1:
@@ -333,6 +346,9 @@ static inline void __v1_v2_tx_user_ready(void *base, int version)
case TPACKET_V2:
__v2_tx_user_ready(base);
break;
+ case TPACKET_V3:
+ __v3_tx_user_ready(base);
+ break;
}
}
@@ -348,7 +364,22 @@ static void __v1_v2_set_packet_loss_discard(int sock)
}
}
-static void walk_v1_v2_tx(int sock, struct ring *ring)
+static inline void *get_next_frame(struct ring *ring, int n)
+{
+ uint8_t *f0 = ring->rd[0].iov_base;
+
+ switch (ring->version) {
+ case TPACKET_V1:
+ case TPACKET_V2:
+ return ring->rd[n].iov_base;
+ case TPACKET_V3:
+ return f0 + (n * ring->req3.tp_frame_size);
+ default:
+ bug_on(1);
+ }
+}
+
+static void walk_tx(int sock, struct ring *ring)
{
struct pollfd pfd;
int rcv_sock, ret;
@@ -360,9 +391,19 @@ static void walk_v1_v2_tx(int sock, struct ring *ring)
.sll_family = PF_PACKET,
.sll_halen = ETH_ALEN,
};
+ int nframes;
+
+ /* TPACKET_V{1,2} sets up the ring->rd* related variables based
+ * on frames (e.g., rd_num is tp_frame_nr) whereas V3 sets these
+ * up based on blocks (e.g, rd_num is tp_block_nr)
+ */
+ if (ring->version <= TPACKET_V2)
+ nframes = ring->rd_num;
+ else
+ nframes = ring->req3.tp_frame_nr;
bug_on(ring->type != PACKET_TX_RING);
- bug_on(ring->rd_num < NUM_PACKETS);
+ bug_on(nframes < NUM_PACKETS);
rcv_sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
if (rcv_sock == -1) {
@@ -388,10 +429,11 @@ static void walk_v1_v2_tx(int sock, struct ring *ring)
create_payload(packet, &packet_len);
while (total_packets > 0) {
- while (__v1_v2_tx_kernel_ready(ring->rd[frame_num].iov_base,
- ring->version) &&
+ void *next = get_next_frame(ring, frame_num);
+
+ while (__tx_kernel_ready(next, ring->version) &&
total_packets > 0) {
- ppd.raw = ring->rd[frame_num].iov_base;
+ ppd.raw = next;
switch (ring->version) {
case TPACKET_V1:
@@ -413,14 +455,27 @@ static void walk_v1_v2_tx(int sock, struct ring *ring)
packet_len);
total_bytes += ppd.v2->tp_h.tp_snaplen;
break;
+ case TPACKET_V3: {
+ struct tpacket3_hdr *tx = next;
+
+ tx->tp_snaplen = packet_len;
+ tx->tp_len = packet_len;
+ tx->tp_next_offset = 0;
+
+ memcpy((uint8_t *)tx + TPACKET3_HDRLEN -
+ sizeof(struct sockaddr_ll), packet,
+ packet_len);
+ total_bytes += tx->tp_snaplen;
+ break;
+ }
}
status_bar_update();
total_packets--;
- __v1_v2_tx_user_ready(ppd.raw, ring->version);
+ __tx_user_ready(next, ring->version);
- frame_num = (frame_num + 1) % ring->rd_num;
+ frame_num = (frame_num + 1) % nframes;
}
poll(&pfd, 1, 1);
@@ -460,7 +515,7 @@ static void walk_v1_v2(int sock, struct ring *ring)
if (ring->type == PACKET_RX_RING)
walk_v1_v2_rx(sock, ring);
else
- walk_v1_v2_tx(sock, ring);
+ walk_tx(sock, ring);
}
static uint64_t __v3_prev_block_seq_num = 0;
@@ -583,7 +638,7 @@ static void walk_v3(int sock, struct ring *ring)
if (ring->type == PACKET_RX_RING)
walk_v3_rx(sock, ring);
else
- bug_on(1);
+ walk_tx(sock, ring);
}
static void __v1_v2_fill(struct ring *ring, unsigned int blocks)
@@ -602,12 +657,13 @@ static void __v1_v2_fill(struct ring *ring, unsigned int blocks)
ring->flen = ring->req.tp_frame_size;
}
-static void __v3_fill(struct ring *ring, unsigned int blocks)
+static void __v3_fill(struct ring *ring, unsigned int blocks, int type)
{
- ring->req3.tp_retire_blk_tov = 64;
- ring->req3.tp_sizeof_priv = 0;
- ring->req3.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
-
+ if (type == PACKET_RX_RING) {
+ ring->req3.tp_retire_blk_tov = 64;
+ ring->req3.tp_sizeof_priv = 0;
+ ring->req3.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
+ }
ring->req3.tp_block_size = getpagesize() << 2;
ring->req3.tp_frame_size = TPACKET_ALIGNMENT << 7;
ring->req3.tp_block_nr = blocks;
@@ -641,7 +697,7 @@ static void setup_ring(int sock, struct ring *ring, int version, int type)
break;
case TPACKET_V3:
- __v3_fill(ring, blocks);
+ __v3_fill(ring, blocks, type);
ret = setsockopt(sock, SOL_PACKET, type, &ring->req3,
sizeof(ring->req3));
break;
@@ -796,6 +852,7 @@ int main(void)
ret |= test_tpacket(TPACKET_V2, PACKET_TX_RING);
ret |= test_tpacket(TPACKET_V3, PACKET_RX_RING);
+ ret |= test_tpacket(TPACKET_V3, PACKET_TX_RING);
if (ret)
return 1;
--
1.7.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox