* [PATCH net-next 0/8] sctp: refactor sctp_outq_flush
From: Marcelo Ricardo Leitner @ 2018-05-11 23:28 UTC (permalink / raw)
To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, Xin Long
Currently sctp_outq_flush does many different things and arguably
unrelated, such as doing transport selection and outq dequeueing.
This patchset refactors it into smaller and more dedicated functions.
The end behavior should be the same.
The next patchset will rework the function parameters.
Marcelo Ricardo Leitner (8):
sctp: add sctp_packet_singleton
sctp: factor out sctp_outq_select_transport
sctp: move the flush of ctrl chunks into its own function
sctp: move outq data rtx code out of sctp_outq_flush
sctp: move flushing of data chunks out of sctp_outq_flush
sctp: move transport flush code out of sctp_outq_flush
sctp: make use of gfp on retransmissions
sctp: rework switch cases in sctp_outq_flush_data
net/sctp/outqueue.c | 593 +++++++++++++++++++++++++++-------------------------
1 file changed, 311 insertions(+), 282 deletions(-)
^ permalink raw reply
* [PATCH net] net: dsa: bcm_sf2: Fix RX_CLS_LOC_ANY overwrite for last rule
From: Florian Fainelli @ 2018-05-11 23:24 UTC (permalink / raw)
To: netdev
Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, David S. Miller,
open list
When we let the kernel pick up a rule location with RX_CLS_LOC_ANY, we
would be able to overwrite the last rules because of a number of issues:
- the IPv4 code path would not be checking that rule_index is within
bounds, the IPv6 code path would only be checking the second index and
not the first one
- find_first_zero_bit() needs to operate on the full bitmap size
(priv->num_cfp_rules) otherwise it would be off by one in the results
it returns and the checks against bcm_sf2_cfp_rule_size() would be non
functioning
Fixes: 3306145866b6 ("net: dsa: bcm_sf2: Move IPv4 CFP processing to specific functions")
Fixes: ba0696c22e7c ("net: dsa: bcm_sf2: Add support for IPv6 CFP rules")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/dsa/bcm_sf2_cfp.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c
index 23b45da784cb..ade5fa3d747d 100644
--- a/drivers/net/dsa/bcm_sf2_cfp.c
+++ b/drivers/net/dsa/bcm_sf2_cfp.c
@@ -354,10 +354,13 @@ static int bcm_sf2_cfp_ipv4_rule_set(struct bcm_sf2_priv *priv, int port,
/* Locate the first rule available */
if (fs->location == RX_CLS_LOC_ANY)
rule_index = find_first_zero_bit(priv->cfp.used,
- bcm_sf2_cfp_rule_size(priv));
+ priv->num_cfp_rules);
else
rule_index = fs->location;
+ if (rule_index > bcm_sf2_cfp_rule_size(priv))
+ return -ENOSPC;
+
layout = &udf_tcpip4_layout;
/* We only use one UDF slice for now */
slice_num = bcm_sf2_get_slice_number(layout, 0);
@@ -563,9 +566,11 @@ static int bcm_sf2_cfp_ipv6_rule_set(struct bcm_sf2_priv *priv, int port,
*/
if (fs->location == RX_CLS_LOC_ANY)
rule_index[0] = find_first_zero_bit(priv->cfp.used,
- bcm_sf2_cfp_rule_size(priv));
+ priv->num_cfp_rules);
else
rule_index[0] = fs->location;
+ if (rule_index[0] > bcm_sf2_cfp_rule_size(priv))
+ return -ENOSPC;
/* Flag it as used (cleared on error path) such that we can immediately
* obtain a second one to chain from.
@@ -573,7 +578,7 @@ static int bcm_sf2_cfp_ipv6_rule_set(struct bcm_sf2_priv *priv, int port,
set_bit(rule_index[0], priv->cfp.used);
rule_index[1] = find_first_zero_bit(priv->cfp.used,
- bcm_sf2_cfp_rule_size(priv));
+ priv->num_cfp_rules);
if (rule_index[1] > bcm_sf2_cfp_rule_size(priv)) {
ret = -ENOSPC;
goto out_err;
--
2.14.1
^ permalink raw reply related
* Re: [PATCH net-next] udp: Fix kernel panic in UDP GSO path
From: Willem de Bruijn @ 2018-05-11 23:16 UTC (permalink / raw)
To: Eric Dumazet
Cc: Sean Tranchetti, Willem de Bruijn, David Miller,
Network Development, Subash Abhinov Kasiviswanathan
In-Reply-To: <ffd8b916-a060-55b5-35c6-13cf81902301@gmail.com>
On Thu, May 10, 2018 at 8:51 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 05/10/2018 05:38 PM, Sean Tranchetti wrote:
>> Using GSO in the UDP path on a device with
>> scatter-gather netdevice feature disabled will result in a kernel
>> panic with the following call stack:
>>
>> This panic is the result of allocating SKBs with small size
>> for the newly segmented SKB. If the scatter-gather feature is
>> disabled, the code attempts to call skb_put() on the small SKB
>> with an argument of nearly the entire unsegmented SKB length.
>>
>> After this patch, attempting to use GSO with scatter-gather
>> disabled will result in -EINVAL being returned.
>>
>> Fixes: 15e36f5b8e98 ("udp: paged allocation with gso")
>> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
>> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
>> ---
>> net/ipv4/ip_output.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index b5e21eb..0d63690 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -1054,8 +1054,16 @@ static int __ip_append_data(struct sock *sk,
>> copy = length;
>>
>> if (!(rt->dst.dev->features&NETIF_F_SG)) {
>> + struct sk_buff *tmp;
>> unsigned int off;
>>
>> + if (paged) {
>> + err = -EINVAL;
>> + while ((tmp = __skb_dequeue(queue)) != NULL)
>> + kfree(tmp);
>> + goto error;
>> + }
>> +
>> off = skb->len;
>> if (getfrag(from, skb_put(skb, copy),
>> offset, copy, off, skb) < 0) {
>>
>
>
> Hmm, no, we absolutely need to fix GSO instead.
>
> Think of a bonding device (or any virtual devices), your patch wont avoid the crash.
Thanks for reporting the issue.
Paged skbuffs is an optimization for gso, but the feature should
continue to work even if gso skbs are linear, indeed (if at the cost
of copying during skb_segment).
We need to make paged contingent on scatter-gather. Rough
patch below. That is for ipv4 only, the same will be needed for ipv6.
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index b5e21eb198d8..b38731d8a44f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -884,7 +884,7 @@ static int __ip_append_data(struct sock *sk,
exthdrlen = !skb ? rt->dst.header_len : 0;
mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
- paged = !!cork->gso_size;
+ paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
^ permalink raw reply related
* safe skb resetting after decapsulation and encapsulation
From: Jason A. Donenfeld @ 2018-05-11 22:56 UTC (permalink / raw)
To: Netdev
Hey Netdev,
A UDP skb comes in via the encap_rcv interface. I do a lot of wild
things to the bytes in the skb -- change where the head starts, modify
a few fragments, decrypt some stuff, trim off some things at the end,
etc. In other words, I'm decapsulating the skb in a pretty intense
way. I benefit from reusing the same skb, performance wise, but after
I'm done processing it, it's really a totally new skb. Eventually it's
time to pass off my skb to netif_receive_skb/netif_rx, but before I do
that, I need to "reinitialize" the skb. (The same goes for when
sending out an skb -- I get it from userspace via ndo_start_xmit, do
crazy things to it, and eventually pass it off to the udp_tunnel send
functions, but first "reinitializing" it.)
At the moment I'm using a function that looks like this:
static void jasons_wild_and_crazy_skb_reset(struct sk_buff *skb)
{
skb_scrub_packet(skb, true); //1
memset(&skb->headers_start, 0, offsetof(struct sk_buff,
headers_end) - offsetof(struct sk_buff, headers_start)); //2
skb->queue_mapping = 0; //3
skb->nohdr = 0; //4
skb->peeked = 0; //5
skb->mac_len = 0; //6
skb->dev = NULL; //7
#ifdef CONFIG_NET_SCHED
skb->tc_index = 0; //8
skb_reset_tc(skb); //9
#endif
skb->hdr_len = skb_headroom(skb); //10
skb_reset_mac_header(skb); //11
skb_reset_network_header(skb); //12
skb_probe_transport_header(skb, 0); //13
skb_reset_inner_headers(skb); //14
}
I'm sure that some of this is wrong. Most of it is based on part of an
Octeon ethernet driver I read a few years ago. I numbered each
statement above, hoping to go through it with you all in detail here,
and see what we can cut away and see what we can approve.
1. Obviously correct and required.
2. This is probably wrong. At least it causes crashes when receiving
packets from RHEL 7.5's latest i40e driver in their vendor
frankenkernel, because those flags there have some critical bits
related to allocation. But there are a lot flags in there that I might
consider going through one by one and zeroing out.
3-5. Fields that should be zero, I assume, after
decapsulating/decrypting (and encapsulating/encrypting).
6. WireGuard is layer 3, so there's no mac.
7. We're later going to change the dev this came in on.
8-9: Same flakey rationale as 2,3-5.
10: Since the headroom has changed during the various modifications, I
need to let the packet field know about it.
11-14: The beginning of the headers has changed, and so resetting and
probing is necessary for this to work at all.
So I'm wondering - how much of this is necessary? How much am I
unnecessarily reinventing things that exist elsewhere? I'm pretty sure
in most cases the driver would work with only 1,10-14, but I worry
that bad things would happen in more unusual configurations. I've
tried to systematically go through the entire stack and see where
these might be used or not used, but it seems really inconsistent.
So, I'm writing wondering if somebody has an easy simplification or
rule for handling this kind of intense decapsulation/decryption (and
encapsulation/encryption operation on the other way) operation. I'd
like to make sure I get this down solid.
Thanks,
Jason
^ permalink raw reply
* Re: [PATCH v6 1/6] net: phy: at803x: Export at803x_debug_reg_mask()
From: Paul Burton @ 2018-05-11 22:22 UTC (permalink / raw)
To: Andrew Lunn; +Cc: Darren Hart, netdev, linux-mips, David S . Miller
In-Reply-To: <20180511192446.GD12738@lunn.ch>
Hi Andrew,
On Fri, May 11, 2018 at 09:24:46PM +0200, Andrew Lunn wrote:
> > I could reorder the probe function a little to initialize the PHY before
> > performing the MAC reset, drop this patch and the AR803X hibernation
> > stuff from patch 2 if you like. But again, I can't actually test the
> > result on the affected hardware.
>
> Hi Paul
>
> I don't like a MAC driver poking around in PHY registers.
>
> So if you can rearrange the code, that would be great.
>
> Thanks
> Andrew
Sure, I'll give it a shot.
After digging into it I see 2 ways to go here:
1) We could just always reset the PHY before we reset the MAC. That
would give us a window of however long the PHY takes to enter its
low power state & stop providing the RX clock during which we'd
need the MAC reset to complete. In the case of the AR8031 that's
"about 10 seconds" according to its data sheet. In this particular
case that feels like plenty, but it does also feel a bit icky to
rely on the timing chosen by the PHY manufacturer to line up with
that of the MAC reset.
2) We could introduce a couple of new phy_* functions to disable &
enable low power states like the AR8031's hibernation feature, by
calling new function pointers in struct phy_driver. Then pch_gbe &
other MACs could call those to have the PHY driver disable
hibernation at times where we know we'll need the RX clock and
re-enable it afterwards.
I'm currently leaning towards option 2. How does that sound to you? Or
can you see another way to handle this?
Thanks,
Paul
^ permalink raw reply
* Re: [PATCH ghak81 RFC V1 1/5] audit: normalize loginuid read access
From: Richard Guy Briggs @ 2018-05-11 22:17 UTC (permalink / raw)
To: Paul Moore
Cc: Linux NetDev Upstream Mailing List, LKML, David Howells,
Linux Security Module list, Linux-Audit Mailing List,
Netfilter Devel List, SElinux list,
Integrity Measurement Architecture, Ingo Molnar
In-Reply-To: <20180510212114.pefeyw5cuqlmjewp@madcap2.tricolour.ca>
On 2018-05-10 17:21, Richard Guy Briggs wrote:
> On 2018-05-09 11:13, Paul Moore wrote:
> > On Fri, May 4, 2018 at 4:54 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> > > Recognizing that the loginuid is an internal audit value, use an access
> > > function to retrieve the audit loginuid value for the task rather than
> > > reaching directly into the task struct to get it.
> > >
> > > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > > ---
> > > kernel/auditsc.c | 16 ++++++++--------
> > > 1 file changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > index 479c031..f3817d0 100644
> > > --- a/kernel/auditsc.c
> > > +++ b/kernel/auditsc.c
> > > @@ -374,7 +374,7 @@ static int audit_field_compare(struct task_struct *tsk,
> > > case AUDIT_COMPARE_EGID_TO_OBJ_GID:
> > > return audit_compare_gid(cred->egid, name, f, ctx);
> > > case AUDIT_COMPARE_AUID_TO_OBJ_UID:
> > > - return audit_compare_uid(tsk->loginuid, name, f, ctx);
> > > + return audit_compare_uid(audit_get_loginuid(tsk), name, f, ctx);
> > > case AUDIT_COMPARE_SUID_TO_OBJ_UID:
> > > return audit_compare_uid(cred->suid, name, f, ctx);
> > > case AUDIT_COMPARE_SGID_TO_OBJ_GID:
> > > @@ -385,7 +385,7 @@ static int audit_field_compare(struct task_struct *tsk,
> > > return audit_compare_gid(cred->fsgid, name, f, ctx);
> > > /* uid comparisons */
> > > case AUDIT_COMPARE_UID_TO_AUID:
> > > - return audit_uid_comparator(cred->uid, f->op, tsk->loginuid);
> > > + return audit_uid_comparator(cred->uid, f->op, audit_get_loginuid(tsk));
> > > case AUDIT_COMPARE_UID_TO_EUID:
> > > return audit_uid_comparator(cred->uid, f->op, cred->euid);
> > > case AUDIT_COMPARE_UID_TO_SUID:
> > > @@ -394,11 +394,11 @@ static int audit_field_compare(struct task_struct *tsk,
> > > return audit_uid_comparator(cred->uid, f->op, cred->fsuid);
> > > /* auid comparisons */
> > > case AUDIT_COMPARE_AUID_TO_EUID:
> > > - return audit_uid_comparator(tsk->loginuid, f->op, cred->euid);
> > > + return audit_uid_comparator(audit_get_loginuid(tsk), f->op, cred->euid);
> > > case AUDIT_COMPARE_AUID_TO_SUID:
> > > - return audit_uid_comparator(tsk->loginuid, f->op, cred->suid);
> > > + return audit_uid_comparator(audit_get_loginuid(tsk), f->op, cred->suid);
> > > case AUDIT_COMPARE_AUID_TO_FSUID:
> > > - return audit_uid_comparator(tsk->loginuid, f->op, cred->fsuid);
> > > + return audit_uid_comparator(audit_get_loginuid(tsk), f->op, cred->fsuid);
> > > /* euid comparisons */
> > > case AUDIT_COMPARE_EUID_TO_SUID:
> > > return audit_uid_comparator(cred->euid, f->op, cred->suid);
> > > @@ -611,7 +611,7 @@ static int audit_filter_rules(struct task_struct *tsk,
> > > result = match_tree_refs(ctx, rule->tree);
> > > break;
> > > case AUDIT_LOGINUID:
> > > - result = audit_uid_comparator(tsk->loginuid, f->op, f->uid);
> > > + result = audit_uid_comparator(audit_get_loginuid(tsk), f->op, f->uid);
> > > break;
> > > case AUDIT_LOGINUID_SET:
> > > result = audit_comparator(audit_loginuid_set(tsk), f->op, f->val);
> > > @@ -2287,8 +2287,8 @@ int audit_signal_info(int sig, struct task_struct *t)
> > > (sig == SIGTERM || sig == SIGHUP ||
> > > sig == SIGUSR1 || sig == SIGUSR2)) {
> > > audit_sig_pid = task_tgid_nr(tsk);
> > > - if (uid_valid(tsk->loginuid))
> > > - audit_sig_uid = tsk->loginuid;
> > > + if (uid_valid(audit_get_loginuid(tsk)))
> > > + audit_sig_uid = audit_get_loginuid(tsk);
> >
> > I realize this comment is a little silly given the nature of loginuid,
> > but if we are going to abstract away loginuid accesses (which I think
> > is good), we should probably access it once, store it in a local
> > variable, perform the validity check on the local variable, then
> > commit the local variable to audit_sig_uid. I realize a TOCTOU
> > problem is unlikely here, but with this new layer of abstraction it
> > seems that some additional safety might be a good thing.
>
> Ok, I'll just assign it to where it is going and check it there, holding
> the audit_ctl_lock the whole time, since it should have been done
> anyways for all of audit_sig_{pid,uid,sid} anyways to get a consistent
> view from the AUDIT_SIGNAL_INFO fetch.
Hmmm, holding audit_ctl_lock won't work because it could sleep trying to
get the lock and the signal info is set in a context where sleeping
isn't permitted. I'll just use a local var...
> > > else
> > > audit_sig_uid = uid;
> > > security_task_getsecid(tsk, &audit_sig_sid);
>
> > paul moore
>
> - RGB
- RGB
--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply
* Re: [PATCH net-next 4/4] bonding: allow carrier and link status to determine link state
From: Jay Vosburgh @ 2018-05-11 22:04 UTC (permalink / raw)
To: Debabrata Banerjee
Cc: David S . Miller, netdev, Veaceslav Falico, Andy Gospodarek
In-Reply-To: <20180511192548.8119-5-dbanerje@akamai.com>
Debabrata Banerjee <dbanerje@akamai.com> wrote:
>In a mixed environment it may be difficult to tell if your hardware
>support carrier, if it does not it can always report true. With a new
>use_carrier option of 2, we can check both carrier and link status
>sequentially, instead of one or the other
What do you mean by "mixed environment," and under what
circumstances are you seeing an actual benefit from doing the MII /
ethtool test in addition to the standard netif_carrier_ok test?
The use_carrier option was meant for backwards compatibility
with old-in-2005 device drivers, so this seem counterintuitive to me. I
don't recall seeing any devices lacking netif_carrier support for some
time. At this point, I would tend to argue that a new device driver
that does not implement netif_carrier support should be fixed, and not
have another hack added to bonding to work around it.
-J
>Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
>---
> Documentation/networking/bonding.txt | 4 ++--
> drivers/net/bonding/bond_main.c | 12 ++++++++----
> drivers/net/bonding/bond_options.c | 7 ++++---
> 3 files changed, 14 insertions(+), 9 deletions(-)
>
>diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
>index 9ba04c0bab8d..f063730e7e73 100644
>--- a/Documentation/networking/bonding.txt
>+++ b/Documentation/networking/bonding.txt
>@@ -828,8 +828,8 @@ use_carrier
> MII / ETHTOOL ioctl method to determine the link state.
>
> A value of 1 enables the use of netif_carrier_ok(), a value of
>- 0 will use the deprecated MII / ETHTOOL ioctls. The default
>- value is 1.
>+ 0 will use the deprecated MII / ETHTOOL ioctls. A value of 2
>+ will check both. The default value is 1.
>
> xmit_hash_policy
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index f7f8a49cb32b..7e9652c4b35c 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -132,7 +132,7 @@ MODULE_PARM_DESC(downdelay, "Delay before considering link down, "
> "in milliseconds");
> module_param(use_carrier, int, 0);
> MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; "
>- "0 for off, 1 for on (default)");
>+ "0 for off, 1 for on (default), 2 for carrier then legacy checks");
> module_param(mode, charp, 0);
> MODULE_PARM_DESC(mode, "Mode of operation; 0 for balance-rr, "
> "1 for active-backup, 2 for balance-xor, "
>@@ -434,12 +434,16 @@ static int bond_check_dev_link(struct bonding *bond,
> int (*ioctl)(struct net_device *, struct ifreq *, int);
> struct ifreq ifr;
> struct mii_ioctl_data *mii;
>+ bool carrier = true;
>
> if (!reporting && !netif_running(slave_dev))
> return 0;
>
> if (bond->params.use_carrier)
>- return netif_carrier_ok(slave_dev) ? BMSR_LSTATUS : 0;
>+ carrier = netif_carrier_ok(slave_dev) ? BMSR_LSTATUS : 0;
>+
>+ if (!carrier)
>+ return carrier;
>
> /* Try to get link status using Ethtool first. */
> if (slave_dev->ethtool_ops->get_link)
>@@ -4399,8 +4403,8 @@ static int bond_check_params(struct bond_params *params)
> downdelay = 0;
> }
>
>- if ((use_carrier != 0) && (use_carrier != 1)) {
>- pr_warn("Warning: use_carrier module parameter (%d), not of valid value (0/1), so it was set to 1\n",
>+ if (use_carrier < 0 || use_carrier > 2) {
>+ pr_warn("Warning: use_carrier module parameter (%d), not of valid value (0-2), so it was set to 1\n",
> use_carrier);
> use_carrier = 1;
> }
>diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>index 8a945c9341d6..dba6cef05134 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -164,9 +164,10 @@ static const struct bond_opt_value bond_primary_reselect_tbl[] = {
> };
>
> static const struct bond_opt_value bond_use_carrier_tbl[] = {
>- { "off", 0, 0},
>- { "on", 1, BOND_VALFLAG_DEFAULT},
>- { NULL, -1, 0}
>+ { "off", 0, 0},
>+ { "on", 1, BOND_VALFLAG_DEFAULT},
>+ { "both", 2, 0},
>+ { NULL, -1, 0}
> };
>
> static const struct bond_opt_value bond_all_slaves_active_tbl[] = {
>--
>2.17.0
>
---
-Jay Vosburgh, jay.vosburgh@canonical.com
^ permalink raw reply
* Re: [PATCH net] macmace: Set platform device coherent_dma_mask
From: Michael Schmitz @ 2018-05-11 22:02 UTC (permalink / raw)
To: Finn Thain
Cc: Geert Uytterhoeven, David S. Miller, linux-m68k, netdev,
Linux Kernel Mailing List, Christoph Hellwig
In-Reply-To: <alpine.LNX.2.21.1805112003250.8@nippy.intranet>
Hi Finn,
Am 11.05.2018 um 22:06 schrieb Finn Thain:
>> You would have to be careful not to overwrite a pdev->dev.dma_mask and
>> pdev->dev.dma_coherent_mask that might have been set in a platform
>> device passed via platform_device_register here. Coldfire is the only
>> m68k platform currently using that, but there might be others in future.
>>
>
> That Coldfire patch could be reverted if this is a better solution.
True, but there might be other uses for deviating from a platform
default (I'm thinking of Atari SCSI and floppy drivers here). But we
could chose the correct mask to set in arch_setup_pdev_archdata()
instead, as it's a platform property not a driver property in that case.
>> ... But I don't think there are smaller DMA masks used by m68k drivers
>> that use the platform device mechanism at present. I've only looked at
>> arch/m68k though.
>
> So we're back at the same problem that Geert's suggestion also raised: how
> to identify potentially affected platform devices and drivers?
>
> Maybe we can take a leaf out of Christoph's book, and leave a noisy
> WARNING splat in the log.
>
> void arch_setup_pdev_archdata(struct platform_device *pdev)
> {
> WARN_ON_ONCE(pdev->dev.coherent_dma_mask != DMA_MASK_NONE ||
> pdev->dev.dma_mask != NULL);
I'd suggest using WARN_ON() so we catch all uses on a particular platform.
I initially thought it necessary to warn on unset mask here, but I see
that would throw up a lot of redundant false positives.
Cheers,
Michael
^ permalink raw reply
* Re: [PATCH net-next 3/4] bonding: allow use of tx hashing in balance-alb
From: Jay Vosburgh @ 2018-05-11 21:49 UTC (permalink / raw)
To: Debabrata Banerjee
Cc: David S . Miller, netdev, Veaceslav Falico, Andy Gospodarek
In-Reply-To: <20180511192548.8119-4-dbanerje@akamai.com>
Debabrata Banerjee <dbanerje@akamai.com> wrote:
>The rx load balancing provided by balance-alb is not mutually
>exclusive with using hashing for tx selection, and should provide a decent
>speed increase because this eliminates spinlocks and cache contention.
>
>Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
>---
> drivers/net/bonding/bond_alb.c | 20 ++++++++++++++++++--
> drivers/net/bonding/bond_main.c | 25 +++++++++++++++----------
> drivers/net/bonding/bond_options.c | 2 +-
> include/net/bonding.h | 10 +++++++++-
> 4 files changed, 43 insertions(+), 14 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 180e50f7806f..6228635880d5 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -1478,8 +1478,24 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
> }
>
> if (do_tx_balance) {
>- hash_index = _simple_hash(hash_start, hash_size);
>- tx_slave = tlb_choose_channel(bond, hash_index, skb->len);
>+ if (bond->params.tlb_dynamic_lb) {
>+ hash_index = _simple_hash(hash_start, hash_size);
>+ tx_slave = tlb_choose_channel(bond, hash_index, skb->len);
>+ } else {
>+ /*
>+ * do_tx_balance means we are free to select the tx_slave
>+ * So we do exactly what tlb would do for hash selection
>+ */
>+
>+ struct bond_up_slave *slaves;
>+ unsigned int count;
>+
>+ slaves = rcu_dereference(bond->slave_arr);
>+ count = slaves ? READ_ONCE(slaves->count) : 0;
>+ if (likely(count))
>+ tx_slave = slaves->arr[bond_xmit_hash(bond, skb) %
>+ count];
>+ }
> }
>
> return bond_do_alb_xmit(skb, bond, tx_slave);
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 1f1e97b26f95..f7f8a49cb32b 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -159,7 +159,7 @@ module_param(min_links, int, 0);
> MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on carrier");
>
> module_param(xmit_hash_policy, charp, 0);
>-MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
>+MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; "
> "0 for layer 2 (default), 1 for layer 3+4, "
> "2 for layer 2+3, 3 for encap layer 2+3, "
> "4 for encap layer 3+4");
>@@ -1735,7 +1735,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
> unblock_netpoll_tx();
> }
>
>- if (bond_mode_uses_xmit_hash(bond))
>+ if (bond_mode_can_use_xmit_hash(bond))
> bond_update_slave_arr(bond, NULL);
>
> bond->nest_level = dev_get_nest_level(bond_dev);
>@@ -1870,7 +1870,7 @@ static int __bond_release_one(struct net_device *bond_dev,
> if (BOND_MODE(bond) == BOND_MODE_8023AD)
> bond_3ad_unbind_slave(slave);
>
>- if (bond_mode_uses_xmit_hash(bond))
>+ if (bond_mode_can_use_xmit_hash(bond))
> bond_update_slave_arr(bond, slave);
>
> netdev_info(bond_dev, "Releasing %s interface %s\n",
>@@ -3102,7 +3102,7 @@ static int bond_slave_netdev_event(unsigned long event,
> * events. If these (miimon/arpmon) parameters are configured
> * then array gets refreshed twice and that should be fine!
> */
>- if (bond_mode_uses_xmit_hash(bond))
>+ if (bond_mode_can_use_xmit_hash(bond))
> bond_update_slave_arr(bond, NULL);
> break;
> case NETDEV_CHANGEMTU:
>@@ -3322,7 +3322,7 @@ static int bond_open(struct net_device *bond_dev)
> */
> if (bond_alb_initialize(bond, (BOND_MODE(bond) == BOND_MODE_ALB)))
> return -ENOMEM;
>- if (bond->params.tlb_dynamic_lb)
>+ if (bond->params.tlb_dynamic_lb || BOND_MODE(bond) == BOND_MODE_ALB)
> queue_delayed_work(bond->wq, &bond->alb_work, 0);
> }
>
>@@ -3341,7 +3341,7 @@ static int bond_open(struct net_device *bond_dev)
> bond_3ad_initiate_agg_selection(bond, 1);
> }
>
>- if (bond_mode_uses_xmit_hash(bond))
>+ if (bond_mode_can_use_xmit_hash(bond))
> bond_update_slave_arr(bond, NULL);
>
> return 0;
>@@ -3892,7 +3892,7 @@ static void bond_slave_arr_handler(struct work_struct *work)
> * to determine the slave interface -
> * (a) BOND_MODE_8023AD
> * (b) BOND_MODE_XOR
>- * (c) BOND_MODE_TLB && tlb_dynamic_lb == 0
>+ * (c) (BOND_MODE_TLB || BOND_MODE_ALB) && tlb_dynamic_lb == 0
> *
> * The caller is expected to hold RTNL only and NO other lock!
> */
>@@ -3945,6 +3945,11 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
> continue;
> if (skipslave == slave)
> continue;
>+
>+ netdev_dbg(bond->dev,
>+ "Adding slave dev %s to tx hash array[%d]\n",
>+ slave->dev->name, new_arr->count);
>+
> new_arr->arr[new_arr->count++] = slave;
> }
>
>@@ -4320,9 +4325,9 @@ static int bond_check_params(struct bond_params *params)
> }
>
> if (xmit_hash_policy) {
>- if ((bond_mode != BOND_MODE_XOR) &&
>- (bond_mode != BOND_MODE_8023AD) &&
>- (bond_mode != BOND_MODE_TLB)) {
>+ if (bond_mode == BOND_MODE_ROUNDROBIN ||
>+ bond_mode == BOND_MODE_ACTIVEBACKUP ||
>+ bond_mode == BOND_MODE_BROADCAST) {
> pr_info("xmit_hash_policy param is irrelevant in mode %s\n",
> bond_mode_name(bond_mode));
> } else {
>diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>index 58c705f24f96..8a945c9341d6 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -395,7 +395,7 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
> .id = BOND_OPT_TLB_DYNAMIC_LB,
> .name = "tlb_dynamic_lb",
> .desc = "Enable dynamic flow shuffling",
>- .unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_TLB)),
>+ .unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_TLB) | BIT(BOND_MODE_ALB)),
> .values = bond_tlb_dynamic_lb_tbl,
> .flags = BOND_OPTFLAG_IFDOWN,
> .set = bond_option_tlb_dynamic_lb_set,
>diff --git a/include/net/bonding.h b/include/net/bonding.h
>index b52235158836..9a41a50b0bd2 100644
>--- a/include/net/bonding.h
>+++ b/include/net/bonding.h
>@@ -285,10 +285,18 @@ static inline bool bond_needs_speed_duplex(const struct bonding *bond)
>
> static inline bool bond_is_nondyn_tlb(const struct bonding *bond)
> {
>- return (BOND_MODE(bond) == BOND_MODE_TLB) &&
>+ return (BOND_MODE(bond) == BOND_MODE_TLB || BOND_MODE(bond) == BOND_MODE_ALB) &&
I believe this could use bond_is_lb(bond) instead.
-J
> (bond->params.tlb_dynamic_lb == 0);
> }
>
>+static inline bool bond_mode_can_use_xmit_hash(const struct bonding *bond)
>+{
>+ return (BOND_MODE(bond) == BOND_MODE_8023AD ||
>+ BOND_MODE(bond) == BOND_MODE_XOR ||
>+ BOND_MODE(bond) == BOND_MODE_TLB ||
>+ BOND_MODE(bond) == BOND_MODE_ALB);
>+}
>+
> static inline bool bond_mode_uses_xmit_hash(const struct bonding *bond)
> {
> return (BOND_MODE(bond) == BOND_MODE_8023AD ||
>--
>2.17.0
>
^ permalink raw reply
* Re: [PATCH net v2] rps: Correct wrong skb_flow_limit check when enable RPS
From: Willem de Bruijn @ 2018-05-11 21:47 UTC (permalink / raw)
To: Gao Feng
Cc: David Miller, Daniel Borkmann, Eric Dumazet, Willem de Bruijn,
jakub.kicinski, ktkhai, Alexei Starovoitov, Rasmus Villemoes,
John Fastabend, Jesper Dangaard Brouer, David Ahern,
Network Development
In-Reply-To: <1525990182-12042-1-git-send-email-gfree.wind@vip.163.com>
On Thu, May 10, 2018 at 6:09 PM, <gfree.wind@vip.163.com> wrote:
> From: Gao Feng <gfree.wind@vip.163.com>
>
> The skb flow limit is implemented for each CPU independently. In the
> current codes, the function skb_flow_limit gets the softnet_data by
> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
> the current cpu when enable RPS. As the result, the skb_flow_limit checks
> the stats of current CPU, while the skb is going to append the queue of
> another CPU. It isn't the expected behavior.
>
> Now pass the softnet_data as a param to make consistent.
>
> Fixes: 99bbc7074190 ("rps: selective flow shedding during softnet overflow")
> Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
See also the discussion in the v1 of this patch.
The merits of moving flow_limit state from irq to rps cpu can
be argued, but the existing behavior is intentional and correct,
so this should not be applied to net and be backported to stable
branches.
My bad for reviving the discussion in the v1 thread while v2 was
already pending, sorry.
^ permalink raw reply
* Re: [PATCH bpf-next 3/7] samples: bpf: compile and link against full libbpf
From: Jakub Kicinski @ 2018-05-11 21:47 UTC (permalink / raw)
To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev
In-Reply-To: <20180510172443.17238-4-jakub.kicinski@netronome.com>
On Thu, 10 May 2018 10:24:39 -0700, Jakub Kicinski wrote:
> samples/bpf currently cherry-picks object files from tools/lib/bpf
> to link against. Just compile the full library and link statically
> against it.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Looks like this breaks some build configs :( Fix is forthcoming, sorry!
^ permalink raw reply
* Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
From: Martin KaFai Lau @ 2018-05-11 21:41 UTC (permalink / raw)
To: Joe Stringer; +Cc: daniel, netdev, ast, john fastabend
In-Reply-To: <CAOftzPg-2JdMOgvwTtubKijaF8mMO+s5w7CdYmFDuBDK3gAiog@mail.gmail.com>
On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote:
> On 10 May 2018 at 22:00, Martin KaFai Lau <kafai@fb.com> wrote:
> > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote:
> >> This patch adds a new BPF helper function, sk_lookup() which allows BPF
> >> programs to find out if there is a socket listening on this host, and
> >> returns a socket pointer which the BPF program can then access to
> >> determine, for instance, whether to forward or drop traffic. sk_lookup()
> >> takes a reference on the socket, so when a BPF program makes use of this
> >> function, it must subsequently pass the returned pointer into the newly
> >> added sk_release() to return the reference.
> >>
> >> By way of example, the following pseudocode would filter inbound
> >> connections at XDP if there is no corresponding service listening for
> >> the traffic:
> >>
> >> struct bpf_sock_tuple tuple;
> >> struct bpf_sock_ops *sk;
> >>
> >> populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
> >> sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0);
> >> if (!sk) {
> >> // Couldn't find a socket listening for this traffic. Drop.
> >> return TC_ACT_SHOT;
> >> }
> >> bpf_sk_release(sk, 0);
> >> return TC_ACT_OK;
> >>
> >> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> >> ---
>
> ...
>
> >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
> >> };
> >> #endif
> >>
> >> +struct sock *
> >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) {
> > Would it be possible to have another version that
> > returns a sk without taking its refcnt?
> > It may have performance benefit.
>
> Not really. The sockets are not RCU-protected, and established sockets
> may be torn down without notice. If we don't take a reference, there's
> no guarantee that the socket will continue to exist for the duration
> of running the BPF program.
>
> From what I follow, the comment below has a hidden implication which
> is that sockets without SOCK_RCU_FREE, eg established sockets, may be
> directly freed regardless of RCU.
Right, SOCK_RCU_FREE sk is the one I am concern about.
For example, TCP_LISTEN socket does not require taking a refcnt
now. Doing a bpf_sk_lookup() may have a rather big
impact on handling TCP syn flood. or the usual intention
is to redirect instead of passing it up to the stack?
>
> /* Sockets having SOCK_RCU_FREE will call this function after one RCU
> * grace period. This is the case for UDP sockets and TCP listeners.
> */
> static void __sk_destruct(struct rcu_head *head)
> ...
>
> Therefore without the refcount, it won't be safe.
^ permalink raw reply
* Re: [PATCH net-next 2/4] bonding: use common mac addr checks
From: Jay Vosburgh @ 2018-05-11 21:29 UTC (permalink / raw)
To: Banerjee, Debabrata
Cc: David S . Miller, netdev@vger.kernel.org, Veaceslav Falico,
Andy Gospodarek
In-Reply-To: <9b51c882f54244e5972da43d7955c959@usma1ex-dag1mb2.msg.corp.akamai.com>
Banerjee, Debabrata <dbanerje@akamai.com> wrote:
>> From: Jay Vosburgh [mailto:jay.vosburgh@canonical.com]
>> Debabrata Banerjee <dbanerje@akamai.com> wrote:
>
>> >- if
>> (!ether_addr_equal_64bits(rx_hash_table[index].mac_dst,
>> >- mac_bcast) &&
>> >-
>> !is_zero_ether_addr(rx_hash_table[index].mac_dst)) {
>> >+ if
>> (is_valid_ether_addr(rx_hash_table[index].mac_dst)) {
>>
>> This change and the similar ones below will now fail non-broadcast
>> multicast Ethernet addresses, where the prior code would not. Is this an
>> intentional change?
>
>Yes I don't see how it makes sense to use multicast addresses at all, but I may be missing something. It's also illegal according to rfc1812 3.3.2, but obviously this balancing mode is trying to be very clever. We probably shouldn't violate the rfc anyway.
Fair enough, but I think it would be good to call this out in
the change log just in case it does somehow cause a regression.
-J
---
-Jay Vosburgh, jay.vosburgh@canonical.com
^ permalink raw reply
* Re: [GIT] Networking
From: Linus Torvalds @ 2018-05-11 21:25 UTC (permalink / raw)
To: David Miller
Cc: Andrew Morton, Network Development, Linux Kernel Mailing List
In-Reply-To: <20180511.170018.656888133931954275.davem@davemloft.net>
David, is there something you want to tell us?
Drugs are bad, m'kay..
Linus
On Fri, May 11, 2018 at 2:00 PM David Miller <davem@davemloft.net> wrote:
> "from Kevin Easton", "Thanks to Bhadram Varka", "courtesy of Gustavo A.
> R. Silva", "To Eric Dumazet we are most grateful for this fix", "This
> fix from YU Bo, we do appreciate", "Once again we are blessed by the
> honorable Eric Dumazet with this fix", "This fix is bestowed upon us by
> Andrew Tomt", "another great gift from Eric Dumazet", "to Hangbin Liu we
> give thanks for this", "Paolo Abeni, he gave us this", "thank you Moshe
> Shemesh", "from our good brother David Howells", "Daniel Juergens,
> you're the best!", "Debabrata Benerjee saved us!", "The ship is now
> water tight, thanks to Andrey Ignatov", "from Colin Ian King, man we've
> got holes everywhere!", "Jiri Pirko what would we do without you!
^ permalink raw reply
* RE: [PATCH net-next 2/4] bonding: use common mac addr checks
From: Banerjee, Debabrata @ 2018-05-11 21:25 UTC (permalink / raw)
To: 'Jay Vosburgh'
Cc: David S . Miller, netdev@vger.kernel.org, Veaceslav Falico,
Andy Gospodarek
In-Reply-To: <4921.1526072038@famine>
> From: Jay Vosburgh [mailto:jay.vosburgh@canonical.com]
> Debabrata Banerjee <dbanerje@akamai.com> wrote:
> >- if
> (!ether_addr_equal_64bits(rx_hash_table[index].mac_dst,
> >- mac_bcast) &&
> >-
> !is_zero_ether_addr(rx_hash_table[index].mac_dst)) {
> >+ if
> (is_valid_ether_addr(rx_hash_table[index].mac_dst)) {
>
> This change and the similar ones below will now fail non-broadcast
> multicast Ethernet addresses, where the prior code would not. Is this an
> intentional change?
Yes I don't see how it makes sense to use multicast addresses at all, but I may be missing something. It's also illegal according to rfc1812 3.3.2, but obviously this balancing mode is trying to be very clever. We probably shouldn't violate the rfc anyway.
^ permalink raw reply
* [PATCH net-next 3/3] net: dsa: mv88e6xxx: add a stats setup function
From: Vivien Didelot @ 2018-05-11 21:16 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, kernel, Vivien Didelot, davem, andrew, f.fainelli
In-Reply-To: <20180511211636.25995-1-vivien.didelot@savoirfairelinux.com>
Now that the Global 1 specific setup function only setup the statistics
unit, kill it in favor of a mv88e6xxx_stats_setup function.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
drivers/net/dsa/mv88e6xxx/chip.c | 27 ++++++++++-----------------
1 file changed, 10 insertions(+), 17 deletions(-)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index df92fed44674..a4efc6544c0d 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -995,14 +995,6 @@ static void mv88e6xxx_get_ethtool_stats(struct dsa_switch *ds, int port,
}
-static int mv88e6xxx_stats_set_histogram(struct mv88e6xxx_chip *chip)
-{
- if (chip->info->ops->stats_set_histogram)
- return chip->info->ops->stats_set_histogram(chip);
-
- return 0;
-}
-
static int mv88e6xxx_get_regs_len(struct dsa_switch *ds, int port)
{
return 32 * sizeof(u16);
@@ -2267,14 +2259,16 @@ static int mv88e6xxx_set_ageing_time(struct dsa_switch *ds,
return err;
}
-static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip)
+static int mv88e6xxx_stats_setup(struct mv88e6xxx_chip *chip)
{
int err;
/* Initialize the statistics unit */
- err = mv88e6xxx_stats_set_histogram(chip);
- if (err)
- return err;
+ if (chip->info->ops->stats_set_histogram) {
+ err = chip->info->ops->stats_set_histogram(chip);
+ if (err)
+ return err;
+ }
return mv88e6xxx_g1_stats_clear(chip);
}
@@ -2300,11 +2294,6 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
goto unlock;
}
- /* Setup Switch Global 1 Registers */
- err = mv88e6xxx_g1_setup(chip);
- if (err)
- goto unlock;
-
err = mv88e6xxx_irl_setup(chip);
if (err)
goto unlock;
@@ -2368,6 +2357,10 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
goto unlock;
}
+ err = mv88e6xxx_stats_setup(chip);
+ if (err)
+ goto unlock;
+
unlock:
mutex_unlock(&chip->reg_lock);
--
2.17.0
^ permalink raw reply related
* [PATCH net-next 2/3] net: dsa: mv88e6xxx: add IEEE and IP mapping ops
From: Vivien Didelot @ 2018-05-11 21:16 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, kernel, Vivien Didelot, davem, andrew, f.fainelli
In-Reply-To: <20180511211636.25995-1-vivien.didelot@savoirfairelinux.com>
All Marvell switch families except 88E6390 have direct registers in
Global 1 for IEEE and IP priorities override mapping. The 88E6390 uses
indirect tables instead.
Add .ieee_pri_map and .ip_pri_map ops to distinct that and call them
from a mv88e6xxx_pri_setup helper. Only non-6390 are concerned ATM.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
drivers/net/dsa/mv88e6xxx/chip.c | 94 +++++++++++++++++++----------
drivers/net/dsa/mv88e6xxx/chip.h | 3 +
drivers/net/dsa/mv88e6xxx/global1.c | 58 ++++++++++++++++++
drivers/net/dsa/mv88e6xxx/global1.h | 3 +
4 files changed, 127 insertions(+), 31 deletions(-)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 1cebde80b101..df92fed44674 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1104,6 +1104,25 @@ static void mv88e6xxx_port_stp_state_set(struct dsa_switch *ds, int port,
dev_err(ds->dev, "p%d: failed to update state\n", port);
}
+static int mv88e6xxx_pri_setup(struct mv88e6xxx_chip *chip)
+{
+ int err;
+
+ if (chip->info->ops->ieee_pri_map) {
+ err = chip->info->ops->ieee_pri_map(chip);
+ if (err)
+ return err;
+ }
+
+ if (chip->info->ops->ip_pri_map) {
+ err = chip->info->ops->ip_pri_map(chip);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
static int mv88e6xxx_devmap_setup(struct mv88e6xxx_chip *chip)
{
int target, port;
@@ -2252,37 +2271,6 @@ static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip)
{
int err;
- /* Configure the IP ToS mapping registers. */
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_0, 0x0000);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_1, 0x0000);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_2, 0x5555);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_3, 0x5555);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_4, 0xaaaa);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_5, 0xaaaa);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_6, 0xffff);
- if (err)
- return err;
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_7, 0xffff);
- if (err)
- return err;
-
- /* Configure the IEEE 802.1p priority mapping register. */
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IEEE_PRI, 0xfa41);
- if (err)
- return err;
-
/* Initialize the statistics unit */
err = mv88e6xxx_stats_set_histogram(chip);
if (err)
@@ -2365,6 +2353,10 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
if (err)
goto unlock;
+ err = mv88e6xxx_pri_setup(chip);
+ if (err)
+ goto unlock;
+
/* Setup PTP Hardware Clock and timestamping */
if (chip->info->ptp_support) {
err = mv88e6xxx_ptp_setup(chip);
@@ -2592,6 +2584,8 @@ static int mv88e6xxx_set_eeprom(struct dsa_switch *ds,
static const struct mv88e6xxx_ops mv88e6085_ops = {
/* MV88E6XXX_FAMILY_6097 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g1_set_switch_mac,
.phy_read = mv88e6185_phy_ppu_read,
@@ -2628,6 +2622,8 @@ static const struct mv88e6xxx_ops mv88e6085_ops = {
static const struct mv88e6xxx_ops mv88e6095_ops = {
/* MV88E6XXX_FAMILY_6095 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.set_switch_mac = mv88e6xxx_g1_set_switch_mac,
.phy_read = mv88e6185_phy_ppu_read,
.phy_write = mv88e6185_phy_ppu_write,
@@ -2652,6 +2648,8 @@ static const struct mv88e6xxx_ops mv88e6095_ops = {
static const struct mv88e6xxx_ops mv88e6097_ops = {
/* MV88E6XXX_FAMILY_6097 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -2686,6 +2684,8 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
static const struct mv88e6xxx_ops mv88e6123_ops = {
/* MV88E6XXX_FAMILY_6165 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -2714,6 +2714,8 @@ static const struct mv88e6xxx_ops mv88e6123_ops = {
static const struct mv88e6xxx_ops mv88e6131_ops = {
/* MV88E6XXX_FAMILY_6185 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.set_switch_mac = mv88e6xxx_g1_set_switch_mac,
.phy_read = mv88e6185_phy_ppu_read,
.phy_write = mv88e6185_phy_ppu_write,
@@ -2747,6 +2749,8 @@ static const struct mv88e6xxx_ops mv88e6131_ops = {
static const struct mv88e6xxx_ops mv88e6141_ops = {
/* MV88E6XXX_FAMILY_6341 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom8,
.set_eeprom = mv88e6xxx_g2_set_eeprom8,
@@ -2784,6 +2788,8 @@ static const struct mv88e6xxx_ops mv88e6141_ops = {
static const struct mv88e6xxx_ops mv88e6161_ops = {
/* MV88E6XXX_FAMILY_6165 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -2817,6 +2823,8 @@ static const struct mv88e6xxx_ops mv88e6161_ops = {
static const struct mv88e6xxx_ops mv88e6165_ops = {
/* MV88E6XXX_FAMILY_6165 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6165_phy_read,
@@ -2843,6 +2851,8 @@ static const struct mv88e6xxx_ops mv88e6165_ops = {
static const struct mv88e6xxx_ops mv88e6171_ops = {
/* MV88E6XXX_FAMILY_6351 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -2877,6 +2887,8 @@ static const struct mv88e6xxx_ops mv88e6171_ops = {
static const struct mv88e6xxx_ops mv88e6172_ops = {
/* MV88E6XXX_FAMILY_6352 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom16,
.set_eeprom = mv88e6xxx_g2_set_eeprom16,
@@ -2916,6 +2928,8 @@ static const struct mv88e6xxx_ops mv88e6172_ops = {
static const struct mv88e6xxx_ops mv88e6175_ops = {
/* MV88E6XXX_FAMILY_6351 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -2951,6 +2965,8 @@ static const struct mv88e6xxx_ops mv88e6175_ops = {
static const struct mv88e6xxx_ops mv88e6176_ops = {
/* MV88E6XXX_FAMILY_6352 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom16,
.set_eeprom = mv88e6xxx_g2_set_eeprom16,
@@ -2990,6 +3006,8 @@ static const struct mv88e6xxx_ops mv88e6176_ops = {
static const struct mv88e6xxx_ops mv88e6185_ops = {
/* MV88E6XXX_FAMILY_6185 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.set_switch_mac = mv88e6xxx_g1_set_switch_mac,
.phy_read = mv88e6185_phy_ppu_read,
.phy_write = mv88e6185_phy_ppu_write,
@@ -3129,6 +3147,8 @@ static const struct mv88e6xxx_ops mv88e6191_ops = {
static const struct mv88e6xxx_ops mv88e6240_ops = {
/* MV88E6XXX_FAMILY_6352 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom16,
.set_eeprom = mv88e6xxx_g2_set_eeprom16,
@@ -3208,6 +3228,8 @@ static const struct mv88e6xxx_ops mv88e6290_ops = {
static const struct mv88e6xxx_ops mv88e6320_ops = {
/* MV88E6XXX_FAMILY_6320 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom16,
.set_eeprom = mv88e6xxx_g2_set_eeprom16,
@@ -3244,6 +3266,8 @@ static const struct mv88e6xxx_ops mv88e6320_ops = {
static const struct mv88e6xxx_ops mv88e6321_ops = {
/* MV88E6XXX_FAMILY_6320 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom16,
.set_eeprom = mv88e6xxx_g2_set_eeprom16,
@@ -3278,6 +3302,8 @@ static const struct mv88e6xxx_ops mv88e6321_ops = {
static const struct mv88e6xxx_ops mv88e6341_ops = {
/* MV88E6XXX_FAMILY_6341 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom8,
.set_eeprom = mv88e6xxx_g2_set_eeprom8,
@@ -3316,6 +3342,8 @@ static const struct mv88e6xxx_ops mv88e6341_ops = {
static const struct mv88e6xxx_ops mv88e6350_ops = {
/* MV88E6XXX_FAMILY_6351 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -3350,6 +3378,8 @@ static const struct mv88e6xxx_ops mv88e6350_ops = {
static const struct mv88e6xxx_ops mv88e6351_ops = {
/* MV88E6XXX_FAMILY_6351 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_g2_smi_phy_read,
@@ -3385,6 +3415,8 @@ static const struct mv88e6xxx_ops mv88e6351_ops = {
static const struct mv88e6xxx_ops mv88e6352_ops = {
/* MV88E6XXX_FAMILY_6352 */
+ .ieee_pri_map = mv88e6085_g1_ieee_pri_map,
+ .ip_pri_map = mv88e6085_g1_ip_pri_map,
.irl_init_all = mv88e6352_g2_irl_init_all,
.get_eeprom = mv88e6xxx_g2_get_eeprom16,
.set_eeprom = mv88e6xxx_g2_set_eeprom16,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index a1bedb0a888b..83d6a8531eaa 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -293,6 +293,9 @@ struct mv88e6xxx_mdio_bus {
};
struct mv88e6xxx_ops {
+ int (*ieee_pri_map)(struct mv88e6xxx_chip *chip);
+ int (*ip_pri_map)(struct mv88e6xxx_chip *chip);
+
/* Ingress Rate Limit unit (IRL) operations */
int (*irl_init_all)(struct mv88e6xxx_chip *chip, int port);
diff --git a/drivers/net/dsa/mv88e6xxx/global1.c b/drivers/net/dsa/mv88e6xxx/global1.c
index 0f2b05342c18..d721ccf7d8be 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.c
+++ b/drivers/net/dsa/mv88e6xxx/global1.c
@@ -241,6 +241,64 @@ int mv88e6185_g1_ppu_disable(struct mv88e6xxx_chip *chip)
return mv88e6185_g1_wait_ppu_disabled(chip);
}
+/* Offset 0x10: IP-PRI Mapping Register 0
+ * Offset 0x11: IP-PRI Mapping Register 1
+ * Offset 0x12: IP-PRI Mapping Register 2
+ * Offset 0x13: IP-PRI Mapping Register 3
+ * Offset 0x14: IP-PRI Mapping Register 4
+ * Offset 0x15: IP-PRI Mapping Register 5
+ * Offset 0x16: IP-PRI Mapping Register 6
+ * Offset 0x17: IP-PRI Mapping Register 7
+ */
+
+int mv88e6085_g1_ip_pri_map(struct mv88e6xxx_chip *chip)
+{
+ int err;
+
+ /* Reset the IP TOS/DiffServ/Traffic priorities to defaults */
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_0, 0x0000);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_1, 0x0000);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_2, 0x5555);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_3, 0x5555);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_4, 0xaaaa);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_5, 0xaaaa);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_6, 0xffff);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_7, 0xffff);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+/* Offset 0x18: IEEE-PRI Register */
+
+int mv88e6085_g1_ieee_pri_map(struct mv88e6xxx_chip *chip)
+{
+ /* Reset the IEEE Tag priorities to defaults */
+ return mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IEEE_PRI, 0xfa41);
+}
+
/* Offset 0x1a: Monitor Control */
/* Offset 0x1a: Monitor & MGMT Control on some devices */
diff --git a/drivers/net/dsa/mv88e6xxx/global1.h b/drivers/net/dsa/mv88e6xxx/global1.h
index c357b3ca9a09..7c791c1da4b9 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.h
+++ b/drivers/net/dsa/mv88e6xxx/global1.h
@@ -277,6 +277,9 @@ int mv88e6095_g1_set_cpu_port(struct mv88e6xxx_chip *chip, int port);
int mv88e6390_g1_set_cpu_port(struct mv88e6xxx_chip *chip, int port);
int mv88e6390_g1_mgmt_rsvd2cpu(struct mv88e6xxx_chip *chip);
+int mv88e6085_g1_ip_pri_map(struct mv88e6xxx_chip *chip);
+int mv88e6085_g1_ieee_pri_map(struct mv88e6xxx_chip *chip);
+
int mv88e6185_g1_set_cascade_port(struct mv88e6xxx_chip *chip, int port);
int mv88e6085_g1_rmu_disable(struct mv88e6xxx_chip *chip);
--
2.17.0
^ permalink raw reply related
* [PATCH net-next 1/3] net: dsa: mv88e6xxx: use helper for 6390 histogram
From: Vivien Didelot @ 2018-05-11 21:16 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, kernel, Vivien Didelot, davem, andrew, f.fainelli
In-Reply-To: <20180511211636.25995-1-vivien.didelot@savoirfairelinux.com>
The Marvell 88E6390 model has its histogram mode bits moved in the
Global 1 Control 2 register. Use the previously introduced
mv88e6xxx_g1_ctl2_mask helper to set them.
At the same time complete the documentation of the said register.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
drivers/net/dsa/mv88e6xxx/global1.c | 15 +++------------
drivers/net/dsa/mv88e6xxx/global1.h | 12 +++++++++---
2 files changed, 12 insertions(+), 15 deletions(-)
diff --git a/drivers/net/dsa/mv88e6xxx/global1.c b/drivers/net/dsa/mv88e6xxx/global1.c
index 244ee1ff9edc..0f2b05342c18 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.c
+++ b/drivers/net/dsa/mv88e6xxx/global1.c
@@ -393,18 +393,9 @@ int mv88e6390_g1_rmu_disable(struct mv88e6xxx_chip *chip)
int mv88e6390_g1_stats_set_histogram(struct mv88e6xxx_chip *chip)
{
- u16 val;
- int err;
-
- err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL2, &val);
- if (err)
- return err;
-
- val |= MV88E6XXX_G1_CTL2_HIST_RX_TX;
-
- err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL2, val);
-
- return err;
+ return mv88e6xxx_g1_ctl2_mask(chip, MV88E6390_G1_CTL2_HIST_MODE_MASK,
+ MV88E6390_G1_CTL2_HIST_MODE_RX |
+ MV88E6390_G1_CTL2_HIST_MODE_TX);
}
int mv88e6xxx_g1_set_device_number(struct mv88e6xxx_chip *chip, int index)
diff --git a/drivers/net/dsa/mv88e6xxx/global1.h b/drivers/net/dsa/mv88e6xxx/global1.h
index e186a026e1b1..c357b3ca9a09 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.h
+++ b/drivers/net/dsa/mv88e6xxx/global1.h
@@ -201,12 +201,13 @@
/* Offset 0x1C: Global Control 2 */
#define MV88E6XXX_G1_CTL2 0x1c
-#define MV88E6XXX_G1_CTL2_HIST_RX 0x0040
-#define MV88E6XXX_G1_CTL2_HIST_TX 0x0080
-#define MV88E6XXX_G1_CTL2_HIST_RX_TX 0x00c0
#define MV88E6185_G1_CTL2_CASCADE_PORT_MASK 0xf000
#define MV88E6185_G1_CTL2_CASCADE_PORT_NONE 0xe000
#define MV88E6185_G1_CTL2_CASCADE_PORT_MULTI 0xf000
+#define MV88E6352_G1_CTL2_HEADER_TYPE_MASK 0xc000
+#define MV88E6352_G1_CTL2_HEADER_TYPE_ORIG 0x0000
+#define MV88E6352_G1_CTL2_HEADER_TYPE_MGMT 0x4000
+#define MV88E6390_G1_CTL2_HEADER_TYPE_LAG 0x8000
#define MV88E6352_G1_CTL2_RMU_MODE_MASK 0x3000
#define MV88E6352_G1_CTL2_RMU_MODE_DISABLED 0x0000
#define MV88E6352_G1_CTL2_RMU_MODE_PORT_4 0x1000
@@ -223,6 +224,11 @@
#define MV88E6390_G1_CTL2_RMU_MODE_PORT_10 0x0300
#define MV88E6390_G1_CTL2_RMU_MODE_ALL_DSA 0x0600
#define MV88E6390_G1_CTL2_RMU_MODE_DISABLED 0x0700
+#define MV88E6390_G1_CTL2_HIST_MODE_MASK 0x00c0
+#define MV88E6390_G1_CTL2_HIST_MODE_RX 0x0040
+#define MV88E6390_G1_CTL2_HIST_MODE_TX 0x0080
+#define MV88E6352_G1_CTL2_CTR_MODE_MASK 0x0060
+#define MV88E6390_G1_CTL2_CTR_MODE 0x0020
#define MV88E6XXX_G1_CTL2_DEVICE_NUMBER_MASK 0x001f
/* Offset 0x1D: Stats Operation Register */
--
2.17.0
^ permalink raw reply related
* [PATCH net-next 0/3] net: dsa: mv88e6xxx: remove Global 1 setup
From: Vivien Didelot @ 2018-05-11 21:16 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, kernel, Vivien Didelot, davem, andrew, f.fainelli
The mv88e6xxx driver is still writing arbitrary registers at setup time,
e.g. priority override bits. Add ops for them and provide specific setup
functions for priority and stats before getting rid of the erroneous
mv88e6xxx_g1_setup code, as previously done with Global 2.
Vivien Didelot (3):
net: dsa: mv88e6xxx: use helper for 6390 histogram
net: dsa: mv88e6xxx: add IEEE and IP mapping ops
net: dsa: mv88e6xxx: add a stats setup function
drivers/net/dsa/mv88e6xxx/chip.c | 121 +++++++++++++++++-----------
drivers/net/dsa/mv88e6xxx/chip.h | 3 +
drivers/net/dsa/mv88e6xxx/global1.c | 73 ++++++++++++++---
drivers/net/dsa/mv88e6xxx/global1.h | 15 +++-
4 files changed, 149 insertions(+), 63 deletions(-)
--
2.17.0
^ permalink raw reply
* Re: [bpf-next V2 PATCH 4/4] xdp: change ndo_xdp_xmit API to support bulking
From: Jesper Dangaard Brouer @ 2018-05-11 21:10 UTC (permalink / raw)
To: netdev, Daniel Borkmann, Alexei Starovoitov,
Jesper Dangaard Brouer
Cc: Christoph Hellwig, BjörnTöpel, Magnus Karlsson
In-Reply-To: <152606233283.30376.3367467095674418599.stgit@firesoul>
On Fri, 11 May 2018 20:12:12 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 03ed492c4e14..debdb6286170 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1185,9 +1185,13 @@ struct dev_ifalias {
> * This function is used to set or query state related to XDP on the
> * netdevice and manage BPF offload. See definition of
> * enum bpf_netdev_command for details.
> - * int (*ndo_xdp_xmit)(struct net_device *dev, struct xdp_frame *xdp);
> - * This function is used to submit a XDP packet for transmit on a
> - * netdevice.
> + * int (*ndo_xdp_xmit)(struct net_device *dev, int n, struct xdp_frame **xdp);
> + * This function is used to submit @n XDP packets for transmit on a
> + * netdevice. Returns number of frames successfully transmitted, frames
> + * that got dropped are freed/returned via xdp_return_frame().
> + * Returns negative number, means general error invoking ndo, meaning
> + * no frames were xmit'ed and core-caller will free all frames.
> + * TODO: Consider add flag to allow sending flush operation.
Another reason for adding a flag to ndo_xdp_xmit, is to allow calling
it from other contexts. Like from AF_XDP TX code path, which in the
sendmsg is not protected by NAPI.
> * void (*ndo_xdp_flush)(struct net_device *dev);
> * This function is used to inform the driver to flush a particular
> * xdp tx queue. Must be called on same CPU as xdp_xmit.
> @@ -1375,8 +1379,8 @@ struct net_device_ops {
> int needed_headroom);
> int (*ndo_bpf)(struct net_device *dev,
> struct netdev_bpf *bpf);
> - int (*ndo_xdp_xmit)(struct net_device *dev,
> - struct xdp_frame *xdp);
> + int (*ndo_xdp_xmit)(struct net_device *dev, int n,
> + struct xdp_frame **xdp);
> void (*ndo_xdp_flush)(struct net_device *dev);
> };
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
From: Joe Stringer @ 2018-05-11 21:08 UTC (permalink / raw)
To: Martin KaFai Lau; +Cc: Joe Stringer, daniel, netdev, ast, john fastabend
In-Reply-To: <20180511045722.p7r4tbog66omohs6@kafai-mbp.dhcp.thefacebook.com>
On 10 May 2018 at 22:00, Martin KaFai Lau <kafai@fb.com> wrote:
> On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote:
>> This patch adds a new BPF helper function, sk_lookup() which allows BPF
>> programs to find out if there is a socket listening on this host, and
>> returns a socket pointer which the BPF program can then access to
>> determine, for instance, whether to forward or drop traffic. sk_lookup()
>> takes a reference on the socket, so when a BPF program makes use of this
>> function, it must subsequently pass the returned pointer into the newly
>> added sk_release() to return the reference.
>>
>> By way of example, the following pseudocode would filter inbound
>> connections at XDP if there is no corresponding service listening for
>> the traffic:
>>
>> struct bpf_sock_tuple tuple;
>> struct bpf_sock_ops *sk;
>>
>> populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
>> sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0);
>> if (!sk) {
>> // Couldn't find a socket listening for this traffic. Drop.
>> return TC_ACT_SHOT;
>> }
>> bpf_sk_release(sk, 0);
>> return TC_ACT_OK;
>>
>> Signed-off-by: Joe Stringer <joe@wand.net.nz>
>> ---
...
>> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
>> };
>> #endif
>>
>> +struct sock *
>> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) {
> Would it be possible to have another version that
> returns a sk without taking its refcnt?
> It may have performance benefit.
Not really. The sockets are not RCU-protected, and established sockets
may be torn down without notice. If we don't take a reference, there's
no guarantee that the socket will continue to exist for the duration
of running the BPF program.
>From what I follow, the comment below has a hidden implication which
is that sockets without SOCK_RCU_FREE, eg established sockets, may be
directly freed regardless of RCU.
/* Sockets having SOCK_RCU_FREE will call this function after one RCU
* grace period. This is the case for UDP sockets and TCP listeners.
*/
static void __sk_destruct(struct rcu_head *head)
...
Therefore without the refcount, it won't be safe.
^ permalink raw reply
* [GIT] Networking
From: David Miller @ 2018-05-11 21:00 UTC (permalink / raw)
To: torvalds; +Cc: akpm, netdev, linux-kernel
1) Verify lengths of keys provided by the user is AF_KEY, from
Kevin Easton.
2) Add device ID for BCM89610 PHY. Thanks to Bhadram Varka.
3) Add Spectre guards to some ATM code, courtesy of Gustavo
A. R. Silva.
4) Fix infinite loop in NSH protocol code. To Eric Dumazet
we are most grateful for this fix.
5) Line up /proc/net/netlink headers properly. This fix from YU Bo,
we do appreciate.
6) Use after free in TLS code. Once again we are blessed by the
honorable Eric Dumazet with this fix.
7) Fix regression in TLS code causing stalls on partial TLS records.
This fix is bestowed upon us by Andrew Tomt.
8) Deal with too small MTUs properly in LLC code, another great gift
from Eric Dumazet.
9) Handle cached route flushing properly wrt. MTU locking in ipv4,
to Hangbin Liu we give thanks for this.
10) Fix regression in SO_BINDTODEVIC handling wrt. UDP socket demux.
Paolo Abeni, he gave us this.
11) Range check coalescing parameters in mlx4 driver, thank you
Moshe Shemesh.
12) Some ipv6 ICMP error handling fixes in rxrpc, from our good
brother David Howells.
13) Fix kexec on mlx5 by freeing IRQs in shutdown path. Daniel
Juergens, you're the best!
14) Don't send bonding RLB updates to invalid MAC addresses.
Debabrata Benerjee saved us!
15) Uh oh, we were leaking in udp_sendmsg and ping_v4_sendmsg. The
ship is now water tight, thanks to Andrey Ignatov.
16) IPSEC memory leak in ixgbe from Colin Ian King, man we've got
holes everywhere!
17) Fix error path in tcf_proto_create, Jiri Pirko what would we
do without you!
Please pull, thanks a lot!
The following changes since commit 1504269814263c9676b4605a6a91e14dc6ceac21:
Merge tag 'linux-kselftest-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest (2018-05-03 19:26:51 -1000)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
for you to fetch changes up to a52956dfc503f8cc5cfe6454959b7049fddb4413:
net sched actions: fix refcnt leak in skbmod (2018-05-11 16:37:03 -0400)
----------------------------------------------------------------
Adi Nissim (1):
net/mlx5: E-Switch, Include VF RDMA stats in vport statistics
Alexander Aring (1):
net: ieee802154: 6lowpan: fix frag reassembly
Anders Roxell (1):
selftests: net: use TEST_PROGS_EXTENDED
Andre Tomt (1):
net/tls: Fix connection stall on partial tls record
Andrew Lunn (1):
net: dsa: mv88e6xxx: Fix PHY interrupts by parameterising PHY base address
Andrey Ignatov (1):
ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
Antoine Tenart (1):
net: phy: sfp: fix the BR,min computation
Bhadram Varka (1):
net: phy: broadcom: add support for BCM89610 PHY
Christophe JAILLET (2):
net/mlx4_en: Fix an error handling path in 'mlx4_en_init_netdev()'
mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()'
Colin Ian King (5):
firestream: fix spelling mistake: "reseverd" -> "reserved"
sctp: fix spelling mistake: "max_retans" -> "max_retrans"
net/9p: fix spelling mistake: "suspsend" -> "suspend"
qed: fix spelling mistake: "taskelt" -> "tasklet"
ixgbe: fix memory leak on ipsec allocation
Daniel Borkmann (1):
bpf: use array_index_nospec in find_prog_type
Daniel Jurgens (1):
net/mlx5: Free IRQs in shutdown path
David Howells (5):
rxrpc: Fix missing start of call timeout
rxrpc: Fix error reception on AF_INET6 sockets
rxrpc: Fix the min security level for kernel calls
rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
rxrpc: Trace UDP transmission failure
David S. Miller (13):
Merge git://git.kernel.org/.../bpf/bpf
Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth
Merge branch 'master' of git://git.kernel.org/.../klassert/ipsec
Merge branch 'Aquantia-various-patches-2018-05'
Merge branch 'ieee802154-for-davem-2018-05-08' of git://git.kernel.org/.../sschmidt/wpan
Merge tag 'linux-can-fixes-for-4.17-20180508' of ssh://gitolite.kernel.org/.../mkl/linux-can
Merge branch 'qed-rdma-fixes'
Merge tag 'mac80211-for-davem-2018-05-09' of git://git.kernel.org/.../jberg/mac80211
Merge tag 'linux-can-fixes-for-4.17-20180510' of ssh://gitolite.kernel.org/.../mkl/linux-can
Merge branch 'bonding-bug-fixes-and-regressions'
Merge tag 'mlx5-fixes-2018-05-10' of git://git.kernel.org/.../saeed/linux
Merge tag 'rxrpc-fixes-20180510' of git://git.kernel.org/.../dhowells/linux-fs
Merge branch '10GbE' of git://git.kernel.org/.../jkirsher/net-queue
Davide Caratti (1):
tc-testing: fix tdc tests for 'bpf' action
Debabrata Banerjee (2):
bonding: do not allow rlb updates to invalid mac
bonding: send learning packets for vlans on slave
Emil Tantilov (1):
ixgbe: return error on unsupported SFP module when resetting
Eric Dumazet (4):
nsh: fix infinite loop
tls: fix use after free in tls_sk_proto_close
llc: better deal with too small mtu
tipc: fix one byte leak in tipc_sk_set_orig_addr()
Ganesh Goudar (2):
cxgb4: zero the HMA memory
cxgb4: copy mbox log size to PF0-3 adap instances
Geert Uytterhoeven (1):
dt-bindings: can: rcar_can: Fix R8A7796 SoC name
Georg Hofmann (1):
trivial: fix inconsistent help texts
Gustavo A. R. Silva (3):
ieee802154: mcr20a: Fix memory leak in mcr20a_probe
atm: zatm: Fix potential Spectre v1
net: atm: Fix potential Spectre v1
Hangbin Liu (1):
ipv4: reset fnhe_mtu_locked after cache route flushed
Hans de Goede (3):
Revert "Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174"
Bluetooth: btusb: Only check needs_reset_resume DMI table for QCA rome chipsets
Bluetooth: btusb: Add Dell XPS 13 9360 to btusb_needs_reset_resume_table
Heiner Kallweit (1):
r8169: fix powering up RTL8168h
Igor Russkikh (2):
net: aquantia: driver should correctly declare vlan_features bits
net: aquantia: Limit number of vectors to actually allocated irqs
Ilan Peer (2):
mac80211: Fix condition validating WMM IE
mac80211: Adjust SAE authentication timeout
Jakob Unterwurzacher (1):
can: dev: increase bus-off message severity
Jeff Shaw (1):
ice: Set rq_last_status when cleaning rq
Jia-Ju Bai (1):
net: ieee802154: atusb: Replace GFP_ATOMIC with GFP_KERNEL in atusb_probe
Jimmy Assarsson (1):
can: kvaser_usb: Increase correct stats counter in kvaser_usb_rx_can_msg()
Jiri Pirko (1):
net: sched: fix error path in tcf_proto_create() when modules are not configured
Johan Hovold (1):
rfkill: gpio: fix memory leak in probe error path
Johannes Berg (1):
cfg80211: limit wiphy names to 128 bytes
Kevin Easton (1):
af_key: Always verify length of provided sadb_key
Luc Van Oostenryck (1):
ixgbevf: fix ixgbevf_xmit_frame()'s return type
Lukas Wunner (2):
can: hi311x: Acquire SPI lock on ->do_get_berr_counter
can: hi311x: Work around TX complete interrupt erratum
Mark Rutland (1):
bpf: fix possible spectre-v1 in find_and_alloc_map()
Michael Chan (1):
tg3: Fix vunmap() BUG_ON() triggered from tg3_free_consistent().
Michal Kalderon (2):
qed: Fix l2 initializations over iWARP personality
qede: Fix gfp flags sent to rdma event node allocation
Mohammed Gamal (1):
hv_netvsc: Fix net device attach on older Windows hosts
Moritz Fischer (2):
net: nixge: Fix error path for obtaining mac address
net: nixge: Address compiler warnings about signedness
Moshe Shemesh (1):
net/mlx4_en: Verify coalescing parameters are in range
Paolo Abeni (1):
udp: fix SO_BINDTODEVICE
Pieter Jansen van Vuuren (1):
nfp: flower: remove headroom from max MTU calculation
Randy Dunlap (1):
mac80211: fix kernel-doc "bad line" warning
Rob Taglang (1):
net: ethernet: sun: niu set correct packet size in skb
Roi Dayan (1):
net/mlx5e: Err if asked to offload TC match on frag being first
Roman Mashak (2):
net sched actions: fix invalid pointer dereferencing if skbedit flags missing
net sched actions: fix refcnt leak in skbmod
Sara Sharon (1):
mac80211: use timeout from the AddBA response instead of the request
Sergei Shtylyov (2):
DT: net: can: rcar_canfd: document R8A77970 bindings
DT: net: can: rcar_canfd: document R8A77980 bindings
Srinivas Dasari (1):
nl80211: Free connkeys on external authentication failure
Stefan Schmidt (1):
net: ieee802154: mcr20a: do not leak resources on error path
Stefano Brivio (2):
vti6: Change minimum MTU to IPV4_MIN_MTU, vti6 can carry IPv4 too
openvswitch: Don't swap table in nlattr_set() after OVS_ATTR_NESTED is found
Steffen Klassert (2):
xfrm: Fix warning in xfrm6_tunnel_net_exit.
MAINTAINERS: Update the 3c59x network driver entry
Stephen Hemminger (1):
hv_netvsc: set master device
Sun Lianwen (1):
net/9p: correct some comment errors in 9p file system code
Uwe Kleine-König (2):
can: flexcan: fix endianess detection
arm: dts: imx[35]*: declare flexcan devices to be compatible to imx25's flexcan
Wolfram Sang (1):
net: flow_dissector: fix typo 'can by' to 'can be'
Xin Long (2):
sctp: delay the authentication for the duplicated cookie-echo chunk
sctp: remove sctp_chunk_put from fail_mark err path in sctp_ulpevent_make_rcvmsg
YU Bo (1):
net/netlink: make sure the headers line up actual value output
Ying Xue (1):
tipc: eliminate KMSAN uninit-value in strcmp complaint
YueHaibing (1):
mac80211_hwsim: fix a possible memory leak in hwsim_new_radio_nl()
weiyongjun (A) (1):
cfg80211: fix possible memory leak in regdb_query_country()
Documentation/devicetree/bindings/net/can/rcar_canfd.txt | 4 ++-
MAINTAINERS | 4 +--
arch/arm/boot/dts/imx35.dtsi | 4 +--
arch/arm/boot/dts/imx53.dtsi | 4 +--
drivers/atm/firestream.c | 2 +-
drivers/atm/zatm.c | 3 +++
drivers/bluetooth/btusb.c | 19 +++++++++++---
drivers/net/bonding/bond_alb.c | 15 ++++++-----
drivers/net/bonding/bond_main.c | 2 ++
drivers/net/can/dev.c | 2 +-
drivers/net/can/flexcan.c | 26 ++++++++++---------
drivers/net/can/spi/hi311x.c | 11 +++++---
drivers/net/can/usb/kvaser_usb.c | 2 +-
drivers/net/dsa/mv88e6xxx/chip.c | 26 +++++++++++++++++++
drivers/net/dsa/mv88e6xxx/chip.h | 1 +
drivers/net/dsa/mv88e6xxx/global2.c | 2 +-
drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 3 +++
drivers/net/ethernet/aquantia/atlantic/aq_nic.h | 1 +
drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 20 +++++++-------
drivers/net/ethernet/broadcom/tg3.c | 9 ++++---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 7 +++--
drivers/net/ethernet/intel/ice/ice_controlq.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 3 +++
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 16 ++++++++++++
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 8 +-----
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 7 +++--
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 4 +++
drivers/net/ethernet/mellanox/mlx5/core/eq.c | 28 ++++++++++++++++++++
drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 11 +++++++-
drivers/net/ethernet/mellanox/mlx5/core/main.c | 8 ++++++
drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h | 2 ++
drivers/net/ethernet/mellanox/mlxsw/core.c | 4 +--
drivers/net/ethernet/netronome/nfp/flower/main.c | 19 --------------
drivers/net/ethernet/ni/nixge.c | 10 ++++---
drivers/net/ethernet/qlogic/qed/qed_l2.c | 6 ++---
drivers/net/ethernet/qlogic/qed/qed_main.c | 2 +-
drivers/net/ethernet/qlogic/qede/qede_rdma.c | 2 +-
drivers/net/ethernet/realtek/r8169.c | 3 +++
drivers/net/ethernet/sun/niu.c | 5 ++--
drivers/net/hyperv/netvsc_drv.c | 3 ++-
drivers/net/hyperv/rndis_filter.c | 2 +-
drivers/net/ieee802154/atusb.c | 2 +-
drivers/net/ieee802154/mcr20a.c | 15 +++++++----
drivers/net/phy/broadcom.c | 10 +++++++
drivers/net/phy/sfp-bus.c | 2 +-
drivers/net/wireless/mac80211_hwsim.c | 1 +
include/linux/brcmphy.h | 1 +
include/net/bonding.h | 1 +
include/net/flow_dissector.h | 2 +-
include/net/mac80211.h | 2 +-
include/net/xfrm.h | 1 +
include/trace/events/rxrpc.h | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
include/uapi/linux/nl80211.h | 2 ++
kernel/bpf/syscall.c | 19 ++++++++++----
net/9p/trans_common.c | 2 +-
net/9p/trans_fd.c | 4 +--
net/9p/trans_rdma.c | 4 +--
net/9p/trans_virtio.c | 5 ++--
net/9p/trans_xen.c | 2 +-
net/atm/lec.c | 9 +++++--
net/ieee802154/6lowpan/6lowpan_i.h | 4 +--
net/ieee802154/6lowpan/reassembly.c | 14 +++++-----
net/ipv4/ping.c | 7 +++--
net/ipv4/route.c | 1 +
net/ipv4/udp.c | 11 +++++---
net/ipv6/Kconfig | 9 +++----
net/ipv6/ip6_vti.c | 4 +--
net/ipv6/udp.c | 4 +--
net/ipv6/xfrm6_tunnel.c | 3 +++
net/key/af_key.c | 45 +++++++++++++++++++++++++-------
net/llc/af_llc.c | 3 +++
net/mac80211/agg-tx.c | 4 +++
net/mac80211/mlme.c | 27 +++++++++++++------
net/mac80211/tx.c | 3 ++-
net/netlink/af_netlink.c | 6 ++---
net/nsh/nsh.c | 4 +++
net/openvswitch/flow_netlink.c | 9 +++----
net/rfkill/rfkill-gpio.c | 7 ++++-
net/rxrpc/af_rxrpc.c | 2 +-
net/rxrpc/ar-internal.h | 1 +
net/rxrpc/conn_event.c | 11 +++++---
net/rxrpc/input.c | 2 +-
net/rxrpc/local_event.c | 3 ++-
net/rxrpc/local_object.c | 57 +++++++++++++++++++++++++++++-----------
net/rxrpc/output.c | 34 ++++++++++++++++++++++--
net/rxrpc/peer_event.c | 46 ++++++++++++++++-----------------
net/rxrpc/rxkad.c | 6 +++--
net/rxrpc/sendmsg.c | 10 +++++++
net/sched/act_skbedit.c | 3 ++-
net/sched/act_skbmod.c | 5 +++-
net/sched/cls_api.c | 2 +-
net/sctp/associola.c | 30 ++++++++++++++++++++-
net/sctp/sm_make_chunk.c | 2 +-
net/sctp/sm_statefuns.c | 86 +++++++++++++++++++++++++++++++++----------------------------
net/sctp/ulpevent.c | 1 -
net/tipc/node.c | 15 +++++++++--
net/tipc/socket.c | 3 ++-
net/tls/tls_main.c | 12 ++++-----
net/wireless/core.c | 3 +++
net/wireless/nl80211.c | 1 +
net/wireless/reg.c | 1 +
net/xfrm/xfrm_state.c | 6 +++++
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/tc-testing/tc-tests/actions/bpf.json | 11 +++++---
106 files changed, 714 insertions(+), 291 deletions(-)
^ permalink raw reply
* Re: [PATCH net-next 2/4] bonding: use common mac addr checks
From: Jay Vosburgh @ 2018-05-11 20:53 UTC (permalink / raw)
To: Debabrata Banerjee
Cc: David S . Miller, netdev, Veaceslav Falico, Andy Gospodarek
In-Reply-To: <20180511192548.8119-3-dbanerje@akamai.com>
Debabrata Banerjee <dbanerje@akamai.com> wrote:
>Replace homegrown mac addr checks with faster defs from etherdevice.h
>
>Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
>---
> drivers/net/bonding/bond_alb.c | 28 +++++++++-------------------
> 1 file changed, 9 insertions(+), 19 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index c2f6c58e4e6a..180e50f7806f 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -40,11 +40,6 @@
> #include <net/bonding.h>
> #include <net/bond_alb.h>
>
>-
>-
>-static const u8 mac_bcast[ETH_ALEN + 2] __long_aligned = {
>- 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
>-};
> static const u8 mac_v6_allmcast[ETH_ALEN + 2] __long_aligned = {
> 0x33, 0x33, 0x00, 0x00, 0x00, 0x01
> };
>@@ -420,9 +415,7 @@ static void rlb_clear_slave(struct bonding *bond, struct slave *slave)
>
> if (assigned_slave) {
> rx_hash_table[index].slave = assigned_slave;
>- if (!ether_addr_equal_64bits(rx_hash_table[index].mac_dst,
>- mac_bcast) &&
>- !is_zero_ether_addr(rx_hash_table[index].mac_dst)) {
>+ if (is_valid_ether_addr(rx_hash_table[index].mac_dst)) {
This change and the similar ones below will now fail
non-broadcast multicast Ethernet addresses, where the prior code would
not. Is this an intentional change?
-J
> bond_info->rx_hashtbl[index].ntt = 1;
> bond_info->rx_ntt = 1;
> /* A slave has been removed from the
>@@ -525,8 +518,7 @@ static void rlb_req_update_slave_clients(struct bonding *bond, struct slave *sla
> client_info = &(bond_info->rx_hashtbl[hash_index]);
>
> if ((client_info->slave == slave) &&
>- !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) &&
>- !is_zero_ether_addr(client_info->mac_dst)) {
>+ is_valid_ether_addr(client_info->mac_dst)) {
> client_info->ntt = 1;
> ntt = 1;
> }
>@@ -567,8 +559,7 @@ static void rlb_req_update_subnet_clients(struct bonding *bond, __be32 src_ip)
> if ((client_info->ip_src == src_ip) &&
> !ether_addr_equal_64bits(client_info->slave->dev->dev_addr,
> bond->dev->dev_addr) &&
>- !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) &&
>- !is_zero_ether_addr(client_info->mac_dst)) {
>+ is_valid_ether_addr(client_info->mac_dst)) {
> client_info->ntt = 1;
> bond_info->rx_ntt = 1;
> }
>@@ -596,7 +587,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
> if ((client_info->ip_src == arp->ip_src) &&
> (client_info->ip_dst == arp->ip_dst)) {
> /* the entry is already assigned to this client */
>- if (!ether_addr_equal_64bits(arp->mac_dst, mac_bcast)) {
>+ if (!is_broadcast_ether_addr(arp->mac_dst)) {
> /* update mac address from arp */
> ether_addr_copy(client_info->mac_dst, arp->mac_dst);
> }
>@@ -644,8 +635,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
> ether_addr_copy(client_info->mac_src, arp->mac_src);
> client_info->slave = assigned_slave;
>
>- if (!ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) &&
>- !is_zero_ether_addr(client_info->mac_dst)) {
>+ if (is_valid_ether_addr(client_info->mac_dst)) {
> client_info->ntt = 1;
> bond->alb_info.rx_ntt = 1;
> } else {
>@@ -1418,9 +1408,9 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
> case ETH_P_IP: {
> const struct iphdr *iph = ip_hdr(skb);
>
>- if (ether_addr_equal_64bits(eth_data->h_dest, mac_bcast) ||
>- (iph->daddr == ip_bcast) ||
>- (iph->protocol == IPPROTO_IGMP)) {
>+ if (is_broadcast_ether_addr(eth_data->h_dest) ||
>+ iph->daddr == ip_bcast ||
>+ iph->protocol == IPPROTO_IGMP) {
> do_tx_balance = false;
> break;
> }
>@@ -1432,7 +1422,7 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
> /* IPv6 doesn't really use broadcast mac address, but leave
> * that here just in case.
> */
>- if (ether_addr_equal_64bits(eth_data->h_dest, mac_bcast)) {
>+ if (is_broadcast_ether_addr(eth_data->h_dest)) {
> do_tx_balance = false;
> break;
> }
>--
>2.17.0
>
^ permalink raw reply
* Re: INFO: rcu detected stall in kfree_skbmem
From: Marcelo Ricardo Leitner @ 2018-05-11 20:42 UTC (permalink / raw)
To: Eric Dumazet
Cc: Dmitry Vyukov, syzbot, Vladislav Yasevich, Neil Horman,
linux-sctp, Andrei Vagin, David Miller, Kirill Tkhai, LKML,
netdev, syzkaller-bugs
In-Reply-To: <683d1ead-d35b-27ee-0f0c-f7e815d989fc@gmail.com>
On Fri, May 11, 2018 at 12:08:33PM -0700, Eric Dumazet wrote:
>
>
> On 05/11/2018 11:41 AM, Marcelo Ricardo Leitner wrote:
>
> > But calling ip6_xmit with rcu_read_lock is expected. tcp stack also
> > does it.
> > Thus I think this is more of an issue with IPv6 stack. If a host has
> > an extensive ip6tables ruleset, it probably generates this more
> > easily.
> >
> >>> sctp_v6_xmit+0x4a5/0x6b0 net/sctp/ipv6.c:225
> >>> sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:650
> >>> sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
> >>> sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
> >>> sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
> >>> sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
> >>> sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
> >>> sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
> >>> call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
> >>> expire_timers kernel/time/timer.c:1363 [inline]
> >
> > Having this call from a timer means it wasn't processing sctp stack
> > for too long.
> >
>
> I feel the problem is that this part is looping, in some infinite loop.
>
> I have seen this stack traces in other reports.
Checked mail history now, seems at least two other reports on RCU
stalls had sctp_generate_heartbeat_event involved.
>
> Maybe some kind of list corruption.
Could be.
Do we know if it generated a flood of packets?
Marcelo
^ permalink raw reply
* Re: [PATCH net 1/1] net sched actions: fix refcnt leak in skbmod
From: David Miller @ 2018-05-11 20:37 UTC (permalink / raw)
To: mrv; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri
In-Reply-To: <1526063733-7813-1-git-send-email-mrv@mojatatu.com>
From: Roman Mashak <mrv@mojatatu.com>
Date: Fri, 11 May 2018 14:35:33 -0400
> When application fails to pass flags in netlink TLV when replacing
> existing skbmod action, the kernel will leak refcnt:
>
> $ tc actions get action skbmod index 1
> total acts 0
>
> action order 0: skbmod pipe set smac 00:11:22:33:44:55
> index 1 ref 1 bind 0
>
> For example, at this point a buggy application replaces the action with
> index 1 with new smac 00:aa:22:33:44:55, it fails because of zero flags,
> however refcnt gets bumped:
>
> $ tc actions get actions skbmod index 1
> total acts 0
>
> action order 0: skbmod pipe set smac 00:11:22:33:44:55
> index 1 ref 2 bind 0
> $
>
> Tha patch fixes this by calling tcf_idr_release() on existing actions.
>
> Fixes: 86da71b57383d ("net_sched: Introduce skbmod action")
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Applied and queued up for -stable, thanks.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox