* [PATCH 1/2] netfilter: nf_nat_snmp_basic: add missing length checks in ASN.1 cbs
From: Pablo Neira Ayuso @ 2019-02-11 16:53 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190211165319.17965-1-pablo@netfilter.org>
From: Jann Horn <jannh@google.com>
The generic ASN.1 decoder infrastructure doesn't guarantee that callbacks
will get as much data as they expect; callbacks have to check the `datalen`
parameter before looking at `data`. Make sure that snmp_version() and
snmp_helper() don't read/write beyond the end of the packet data.
(Also move the assignment to `pdata` down below the check to make it clear
that it isn't necessarily a pointer we can use before the `datalen` check.)
Fixes: cc2d58634e0f ("netfilter: nf_nat_snmp_basic: use asn1 decoder library")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/ipv4/netfilter/nf_nat_snmp_basic_main.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic_main.c b/net/ipv4/netfilter/nf_nat_snmp_basic_main.c
index a0aa13bcabda..0a8a60c1bf9a 100644
--- a/net/ipv4/netfilter/nf_nat_snmp_basic_main.c
+++ b/net/ipv4/netfilter/nf_nat_snmp_basic_main.c
@@ -105,6 +105,8 @@ static void fast_csum(struct snmp_ctx *ctx, unsigned char offset)
int snmp_version(void *context, size_t hdrlen, unsigned char tag,
const void *data, size_t datalen)
{
+ if (datalen != 1)
+ return -EINVAL;
if (*(unsigned char *)data > 1)
return -ENOTSUPP;
return 1;
@@ -114,8 +116,11 @@ int snmp_helper(void *context, size_t hdrlen, unsigned char tag,
const void *data, size_t datalen)
{
struct snmp_ctx *ctx = (struct snmp_ctx *)context;
- __be32 *pdata = (__be32 *)data;
+ __be32 *pdata;
+ if (datalen != 4)
+ return -EINVAL;
+ pdata = (__be32 *)data;
if (*pdata == ctx->from) {
pr_debug("%s: %pI4 to %pI4\n", __func__,
(void *)&ctx->from, (void *)&ctx->to);
--
2.11.0
^ permalink raw reply related
* [PATCH 2/2] netfilter: nat: fix spurious connection timeouts
From: Pablo Neira Ayuso @ 2019-02-11 16:53 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190211165319.17965-1-pablo@netfilter.org>
From: Florian Westphal <fw@strlen.de>
Sander Eikelenboom bisected a NAT related regression down
to the l4proto->manip_pkt indirection removal.
I forgot that ICMP(v6) errors (e.g. PKTTOOBIG) can be set as related
to the existing conntrack entry.
Therefore, when passing the skb to nf_nat_ipv4/6_manip_pkt(), that
ended up calling the wrong l4 manip function, as tuple->dst.protonum
is the original flows l4 protocol (TCP, UDP, etc).
Set the dst protocol field to ICMP(v6), we already have a private copy
of the tuple due to the inversion of src/dst.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Fixes: faec18dbb0405 ("netfilter: nat: remove l4proto->manip_pkt")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/ipv4/netfilter/nf_nat_l3proto_ipv4.c | 1 +
net/ipv6/netfilter/nf_nat_l3proto_ipv6.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
index 2687db015b6f..fa2ba7c500e4 100644
--- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
@@ -215,6 +215,7 @@ int nf_nat_icmp_reply_translation(struct sk_buff *skb,
/* Change outer to look like the reply to an incoming packet */
nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
+ target.dst.protonum = IPPROTO_ICMP;
if (!nf_nat_ipv4_manip_pkt(skb, 0, &target, manip))
return 0;
diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
index 23022447eb49..7a41ee3c11b4 100644
--- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
@@ -226,6 +226,7 @@ int nf_nat_icmpv6_reply_translation(struct sk_buff *skb,
}
nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
+ target.dst.protonum = IPPROTO_ICMPV6;
if (!nf_nat_ipv6_manip_pkt(skb, 0, &target, manip))
return 0;
--
2.11.0
^ permalink raw reply related
* Re: [RFC 1/3] devlink: add flash update command
From: Jiri Pirko @ 2019-02-11 16:45 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers, mkubecek, andrew
In-Reply-To: <20190211065923.22670-2-jakub.kicinski@netronome.com>
Mon, Feb 11, 2019 at 07:59:20AM CET, jakub.kicinski@netronome.com wrote:
>Add devlink flash update command. Advanced NICs have firmware
>stored in flash and often cryptographically secured. Updating
>that flash is handled by management firmware. Ethtool has a
>flash update command which served us well, however, it has two
>shortcomings:
> - it takes rtnl_lock unnecessarily - really flash update has
> nothing to do with networking, so using a networking device
> as a handle is suboptimal, which leads us to the second one:
> - it requires a functioning netdev - in case device enters an
> error state and can't spawn a netdev (e.g. communication
> with the device fails) there is no netdev to use as a handle
> for flashing.
>
>Devlink already has the ability to report the firmware versions,
>now with the ability to update the firmware/flash we will be
>able to recover devices in bad state.
>
>To enable easy interoperability with ethtool add the target
>partition ID. We may or may not add a different method of
>identification, but there is no such immediate need.
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>---
> include/net/devlink.h | 2 ++
> include/uapi/linux/devlink.h | 6 ++++++
> net/core/devlink.c | 30 ++++++++++++++++++++++++++++++
> 3 files changed, 38 insertions(+)
>
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index 07660fe4c0e3..55b3478b1291 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -529,6 +529,8 @@ struct devlink_ops {
> struct netlink_ext_ack *extack);
> int (*info_get)(struct devlink *devlink, struct devlink_info_req *req,
> struct netlink_ext_ack *extack);
>+ int (*flash_update)(struct devlink *devlink, const char *path,
>+ u32 target, struct netlink_ext_ack *extack);
> };
>
> static inline void *devlink_priv(struct devlink *devlink)
>diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>index 72d9f7c89190..f4417283fd1b 100644
>--- a/include/uapi/linux/devlink.h
>+++ b/include/uapi/linux/devlink.h
>@@ -103,6 +103,8 @@ enum devlink_command {
> DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
> DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
>
>+ DEVLINK_CMD_FLASH_UPDATE,
>+
> /* add new commands above here */
> __DEVLINK_CMD_MAX,
> DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
>@@ -326,6 +328,10 @@ enum devlink_attr {
> DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS, /* u64 */
> DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD, /* u64 */
> DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER, /* u8 */
>+
>+ DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME, /* string */
>+ DEVLINK_ATTR_FLASH_UPDATE_TARGET_ID, /* u32 */
Do we need to carry this on? I mean, the ef->region is only checked in 4
drivers against ETHTOOL_FLASH_ALL_REGIONS, which is default.
There is this bnxt driver which is the only one working with this value.
I think that for compat, there should be some id-region mapping
provided by driver which the compat layer would use to translate.
I also think that this should be in sync with what is returned in
DEVLINK_ATTR_INFO_VERSION_NAME.
For example:
$ devlink dev info pci/0000:82:00.0
pci/0000:82:00.0:
driver nfp
serial_number 16240145
versions:
fixed:
board.id AMDA0081-0001
board.rev 15
board.vendor SMA
board.model hydrogen
running:
fw.mgmt 010181.010181.0101d4
fw.cpld 0x1030000
fw.app abm-d372b6
fw.undi 0.0.2
chip.init AMDA-0081-0001 20160318164536
stored:
fw.mgmt 010181.010181.0101d4
fw.app abm-d372b6
fw.undi 0.0.2
chip.init AMDA-0081-0001 20160318164536
Now user should be able to use one of the identifiers to flash relevant
fw, like:
devlink dev flash pci/0000:82:00.0 XXX fw.mgmt file flash-boot.bin
I'm not sure about the name of "xxx" attribute. Maybe "id":
devlink dev flash pci/0000:82:00.0 id fw.mgmt file flash-boot.bin
devlink dev flash pci/0000:82:00.0 id fw.cpld file some-other.bin
[...]
^ permalink raw reply
* Re: [net-next PATCH 1/2] mm: add dma_addr_t to struct page
From: Matthew Wilcox @ 2019-02-11 16:55 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: netdev, linux-mm, Toke Høiland-Jørgensen,
Ilias Apalodimas, Saeed Mahameed, Andrew Morton, mgorman,
David S. Miller, Tariq Toukan
In-Reply-To: <154990120685.24530.15350136329514629029.stgit@firesoul>
On Mon, Feb 11, 2019 at 05:06:46PM +0100, Jesper Dangaard Brouer wrote:
> The page_pool API is using page->private to store DMA addresses.
> As pointed out by David Miller we can't use that on 32-bit architectures
> with 64-bit DMA
>
> This patch adds a new dma_addr_t struct to allow storing DMA addresses
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
> + struct { /* page_pool used by netstack */
> + /**
> + * @dma_addr: Page_pool need to store DMA-addr, and
s/need/needs/
> + * cannot use @private, as DMA-mappings can be 64-bit
s/DMA-mappings/DMA addresses/
> + * even on 32-bit Architectures.
s/A/a/
> + */
> + dma_addr_t dma_addr; /* Shares area with @lru */
It also shares with @slab_list, @next, @compound_head, @pgmap and
@rcu_head. I think it's pointless to try to document which other fields
something shares space with; the places which do it are a legacy from
before I rearranged struct page last year. Anyone looking at this should
now be able to see "Oh, this is a union, only use the fields which are
in the union for the type of struct page I have here".
Are the pages allocated from this API ever supposed to be mapped to
userspace?
You also say in the documentation:
* If no DMA mapping is done, then it can act as shim-layer that
* fall-through to alloc_page. As no state is kept on the page, the
* regular put_page() call is sufficient.
I think this is probably a dangerous precedent to set. Better to require
exactly one call to page_pool_put_page() (with the understanding that the
refcount may be elevated, so this may not be the final free of the page,
but the page will no longer be usable for its page_pool purpose).
^ permalink raw reply
* Re: [PATCH net] sctp: make sctp_setsockopt_events() less strict about the option length
From: Marcelo Ricardo Leitner @ 2019-02-11 17:05 UTC (permalink / raw)
To: Neil Horman
Cc: David Miller, julien, netdev, linux-sctp, linux-kernel, vyasevich,
lucien.xin
In-Reply-To: <20190211150432.GA13525@hmswarspite.think-freely.org>
On Mon, Feb 11, 2019 at 10:04:32AM -0500, Neil Horman wrote:
> On Sun, Feb 10, 2019 at 10:46:16AM -0200, Marcelo Ricardo Leitner wrote:
> > On Sat, Feb 09, 2019 at 03:12:17PM -0800, David Miller wrote:
> > > From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > Date: Wed, 6 Feb 2019 18:37:54 -0200
> > >
> > > > On Wed, Feb 06, 2019 at 12:14:30PM -0800, Julien Gomes wrote:
> > > >> Make sctp_setsockopt_events() able to accept sctp_event_subscribe
> > > >> structures longer than the current definitions.
> > > >>
> > > >> This should prevent unjustified setsockopt() failures due to struct
> > > >> sctp_event_subscribe extensions (as in 4.11 and 4.12) when using
> > > >> binaries that should be compatible, but were built with later kernel
> > > >> uapi headers.
> > > >
> > > > Not sure if we support backwards compatibility like this?
> > >
> > > What a complete mess we have here.
> > >
> > > Use new socket option numbers next time, do not change the size and/or
> > > layout of existing socket options.
> >
> > What about reusing the same socket option, but defining a new struct?
> > Say, MYSOCKOPT supports struct mysockopt, struct mysockopt2, struct
> > mysockopt3...
> >
> > That way we have a clear definition of the user's intent.
> >
> Thats possible, but I think thats pretty equivalaent to what daves saying, in
> that he wants us to identify all the sizes of this struct and the git history
> and act on them accordingly. Having internal versions of the struct seems like
> a fine way to get there, but I think we need to consider how we got to this
> situations before we go down the implementation path.
I was more referring to future stuff, but yes. I find it a bit easier
to handle than having to switch the sockopt too and so far I couldn't
find drawbacks to it.
That is, when using a new sockopt, we could accept a buffer larger
than the needed, but I'm not considering that as a valid point
anymore. Putting this compatibility aside for a moment, that pretty
much means the user doesn't know what it wants and so we also don't.
>
> > >
> > > This whole thread, if you read it, is basically "if we compatability
> > > this way, that breaks, and if we do compatability this other way oh
> > > shit this other thing doesn't work."
> > >
> > > I think we really need to specifically check for the difference sizes
> > > that existed one by one, clear out the part not given by the user, and
> > > backport this as far back as possible in a way that in the older kernels
> > > we see if the user is actually trying to use the new features and if so
> > > error out.
> >
> > I'm afraid clearing out may not be enough, though seems it's the best
> > we can do so far. If the struct is allocated but not fully initialized
> > via a memset, but by setting its fields one by one, the remaining new
> > fields will be left uninitinialized.
> >
>
> I'm not sure this even makes sense. Currently (as I understood it), the issue
> we are facing is the one in which an application is built against a newer kernel
> and run on an older one, the implication there being that the application will
> pass in a buffer that is larger than what the kernel expects. In that
> situation, clearing isn't needed, all thats needed (I think), is a memcmp of the
> space between the sizeof(kernel struct version), and sizeof(userspace struct
> version) to see if any bits are non-zero. If they are, we error out, otherwise,
> we ignore the space and move forward as though that overage doesn't exist.
That's exactly what I tried to mean. :-)
>
> Mind you, I'm not (yet) advocating for that approach, just trying to clarify
> whats needed.
Ok.
> > >
> > > Which, btw, is terrible behavior. Newly compiled apps should work on
> > > older kernels if they don't try to use the new features, and if they
> >
> > One use case here is: a given distro is using kernel X and app Foo is
> > built against it. Then upgrades to X+1, Foo is patched to fix an issue
> > and is rebuilt against X+1. The user upgrades Foo package but for
> > whatever reason, doesn't upgrade kernel or reboot the system. Here,
> > Foo doesn't work anymore until the new kernel is also running.
> >
> Yes, thats the use case that we're trying to address.
>
> > > can the ones that want to try to use the new features should be able
> > > to fall back when that feature isn't available in a non-ambiguous
> > > and precisely defined way.
> > >
> > > The fact that the use of the new feature is hidden in the new
> > > structure elements is really rotten.
> > >
> > > This patch, at best, needs some work and definitely a longer and more
> > > detailed commit message.
> >
> FWIW, before we decide on a course of action, I think I need to point out that,
> over the last 10 years, we've extended this structure 6 times, in the following
> commits:
> 0f3fffd8ab1db
> 7e8616d8e7731
> e1cdd553d482c
> 35ea82d611da5
> c95129d127c6d
> b444153fb5a64
>
> The first two I believe were modifications during a period when sctp was
> actually getting integrated to the kernel, but the last 4 were definately done
> during more recent development periods and wen't in without any commentary about
> the impact to UAPI compatibility. The check for optlen > sizeof(struct
> sctp_event_subscribe) was made back in 2008, and while not spelled out, seems
> pretty clearly directed at enforcing compatibility with older appliations, not
> compatibility with newer applications running on older kernels.
>
> I really worry about situations in which we need to support applications
> expecting features that the running kernel doesn't have. In this particular
> situation it seems like a fixable thing, but I could envision situations in
> which we just can't do it, and I don't want to set that expectation when we
> can't consistently meet it.
>
> So, if the consensus is that we need to support applications built on newer
> kernels, but run on older kernels (and I'd like to get verbal consensus on
Yes from my side.
> that), then we need to identify a method to fix this. I'm still hesitant to
> do anything that involves us accepting any size buffer over the kernel expected
> size, as that puts us in a position to have to read large amounts of user data
> (i.e. possible DOS), and just picking an arbitrary large number to limit the
> buffer size seems wrong. What if, on receipt of a structure from a newer kernel
> (implying a size larger than what the kernel expects), we clamp optlen to the
> kernel size, and put_user it back to the application? i.e. we don't check any
We can't do that on setsockopt calls, as optlen is R/O there.
Returning > 0 is not specified on setsockopt(2).
> data above and beyond what the the kernel knows about, but we use the optlen as
> an indicator to user space that not all the data was processed? That allows the
> kernel to ignore the overage safely, and while its not in the socket api
> extension RFC, its not violating anything, and is something we can document in
> the sctp(7) man page as a linux only behavior.
>
> Thoughts?
> Neil
I also need to dig deeper on this, but in general what if we draw
a line based on the current implementation:
- Current struct is X bytes long
- Patch current and older kernels to accept up to X bytes, as long as
the trailing bytes are zeroed. Otherwise, EINVAL.
X may be a magic number for old kernel, but this way we avoid
unbounded buffers and the limit is not random.
- On further changes, create a new, explicitly versioned struct.
Older kernels will EINVAL if this new struct is used, which is
expected.
Newer kernels will then have to cope with the different
sizes/structs accordingly.
Marcelo
^ permalink raw reply
* Re: linux-next: Tree for Feb 11 (wireless/80211)
From: Randy Dunlap @ 2019-02-11 17:07 UTC (permalink / raw)
To: Stephen Rothwell, Linux Next Mailing List
Cc: Linux Kernel Mailing List, linux-wireless, netdev@vger.kernel.org,
Johannes Berg
In-Reply-To: <20190211183951.3fe1dad9@canb.auug.org.au>
On 2/10/19 11:39 PM, Stephen Rothwell wrote:
> Hi all,
>
> Changes since 20190208:
>
on i386:
ERROR: "__umoddi3" [net/wireless/cfg80211.ko] undefined!
ERROR: "__umoddi3" [net/mac80211/mac80211.ko] undefined!
--
~Randy
^ permalink raw reply
* Re: [RFC, PATCH] net: page_pool: Don't use page->private to store dma_addr_t
From: Eric Dumazet @ 2019-02-11 17:14 UTC (permalink / raw)
To: Tariq Toukan, Ilias Apalodimas, Matthew Wilcox
Cc: David Miller, brouer@redhat.com, toke@redhat.com,
netdev@vger.kernel.org, mgorman@techsingularity.net,
linux-mm@kvack.org
In-Reply-To: <bfd83487-7073-18c8-6d89-e50fe9a83313@mellanox.com>
On 02/11/2019 12:53 AM, Tariq Toukan wrote:
>
> Hi,
>
> It's great to use the struct page to store its dma mapping, but I am
> worried about extensibility.
> page_pool is evolving, and it would need several more per-page fields.
> One of them would be pageref_bias, a planned optimization to reduce the
> number of the costly atomic pageref operations (and replace existing
> code in several drivers).
>
But the point about pageref_bias is to place it in a different cache line than "struct page"
The major cost is having a cache line bouncing between producer and consumer.
pageref_bias means the producer only have to read the "struct page" and not dirty it
in the case the page can be recycled.
> I would replace this dma field with a pointer to an extensible struct,
> that would contain the dma mapping (and other stuff in the near future).
> This pointer fits perfectly with the existing unsigned long private;
> they can share the memory, for both 32- and 64-bits systems.
>
> The only downside is one more pointer de-reference. This should be perf
> tested.
> However, when introducing the page refcnt bias optimization into
> page_pool, I believe the perf gain would be guaranteed.
Only in some cases perhaps (when the cache line can be dirtied without performance hit)
^ permalink raw reply
* RE: [PATCH 1/2] xsk: do not use mmap_sem
From: Weiny, Ira @ 2019-02-11 17:16 UTC (permalink / raw)
To: Williams, Dan J, Daniel Borkmann
Cc: Björn Töpel, Davidlohr Bueso, Andrew Morton, Linux MM,
LKML, David S . Miller, Topel, Bjorn, Karlsson, Magnus, Netdev,
Davidlohr Bueso
In-Reply-To: <CAPcyv4gjUmRdV1jZegLecocj155m7dpQLxQSnF_HQQErD8Gtag@mail.gmail.com>
> > >> ---
> > >> net/xdp/xdp_umem.c | 6 ++----
> > >> 1 file changed, 2 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index
> > >> 5ab236c5c9a5..25e1e76654a8 100644
> > >> --- a/net/xdp/xdp_umem.c
> > >> +++ b/net/xdp/xdp_umem.c
> > >> @@ -265,10 +265,8 @@ static int xdp_umem_pin_pages(struct
> xdp_umem *umem)
> > >> if (!umem->pgs)
> > >> return -ENOMEM;
> > >>
> > >> - down_write(¤t->mm->mmap_sem);
> > >> - npgs = get_user_pages(umem->address, umem->npgs,
> > >> - gup_flags, &umem->pgs[0], NULL);
> > >> - up_write(¤t->mm->mmap_sem);
> > >> + npgs = get_user_pages_fast(umem->address, umem->npgs,
> > >> + gup_flags, &umem->pgs[0]);
> > >>
> > >
> > > Thanks for the patch!
> > >
> > > The lifetime of the pinning is similar to RDMA umem mapping, so
> > > isn't gup_longterm preferred?
> >
> > Seems reasonable from reading what gup_longterm seems to do. Davidlohr
> > or Dan, any thoughts on the above?
>
> Yes, any gup where the lifetime of the corresponding put_page() is under
> direct control of userspace should be using the _longterm flavor to
> coordinate be careful in the presence of dax. In the dax case there is no page
> cache indirection to coordinate truncate vs page pins. Ira is looking to add a
> "fast + longterm" flavor for cases like this.
Yes I'm just about ready with a small patch set to add a GUP fast longterm.
I think this should wait for that series.
Ira
^ permalink raw reply
* [PATCH V1 net 0/2] net: ena: race condition bug fix and version update
From: akiyano @ 2019-02-11 17:17 UTC (permalink / raw)
To: davem, netdev
Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, gtzalik, netanel, alisaidi
From: Arthur Kiyanovski <akiyano@amazon.com>
This patchset includes a fix to a race condition that can cause
kernel panic, as well as a driver version update because of this
fix.
Arthur Kiyanovski (2):
net: ena: fix race between link up and device initalization
net: ena: update driver version from 2.0.2 to 2.0.3
drivers/net/ethernet/amazon/ena/ena_netdev.c | 10 +++++-----
drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 +-
2 files changed, 6 insertions(+), 6 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH V1 net 2/2] net: ena: update driver version from 2.0.2 to 2.0.3
From: akiyano @ 2019-02-11 17:17 UTC (permalink / raw)
To: davem, netdev
Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, gtzalik, netanel, alisaidi
In-Reply-To: <1549905464-13758-1-git-send-email-akiyano@amazon.com>
From: Arthur Kiyanovski <akiyano@amazon.com>
Update driver version due to bug fix.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index dc8b6173d8d8..63870072cbbd 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -45,7 +45,7 @@
#define DRV_MODULE_VER_MAJOR 2
#define DRV_MODULE_VER_MINOR 0
-#define DRV_MODULE_VER_SUBMINOR 2
+#define DRV_MODULE_VER_SUBMINOR 3
#define DRV_MODULE_NAME "ena"
#ifndef DRV_MODULE_VERSION
--
2.17.1
^ permalink raw reply related
* [PATCH V1 net 1/2] net: ena: fix race between link up and device initalization
From: akiyano @ 2019-02-11 17:17 UTC (permalink / raw)
To: davem, netdev
Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, gtzalik, netanel, alisaidi
In-Reply-To: <1549905464-13758-1-git-send-email-akiyano@amazon.com>
From: Arthur Kiyanovski <akiyano@amazon.com>
Fix race condition between ena_update_on_link_change() and
ena_restore_device().
This race can occur if link notification arrives while the driver
is performing a reset sequence. In this case link can be set up,
enabling the device, before it is fully restored. If packets are
sent at this time, the driver might access uninitialized data
structures, causing kernel crash.
Move the clearing of ENA_FLAG_ONGOING_RESET and netif_carrier_on()
after ena_up() to ensure the device is ready when link is set up.
Fixes: d18e4f683445 ("net: ena: fix race condition between device reset and link up setup")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_netdev.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index a70bb1bb90e7..a6eacf2099c3 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -2663,11 +2663,6 @@ static int ena_restore_device(struct ena_adapter *adapter)
goto err_device_destroy;
}
- clear_bit(ENA_FLAG_ONGOING_RESET, &adapter->flags);
- /* Make sure we don't have a race with AENQ Links state handler */
- if (test_bit(ENA_FLAG_LINK_UP, &adapter->flags))
- netif_carrier_on(adapter->netdev);
-
rc = ena_enable_msix_and_set_admin_interrupts(adapter,
adapter->num_queues);
if (rc) {
@@ -2684,6 +2679,11 @@ static int ena_restore_device(struct ena_adapter *adapter)
}
set_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags);
+
+ clear_bit(ENA_FLAG_ONGOING_RESET, &adapter->flags);
+ if (test_bit(ENA_FLAG_LINK_UP, &adapter->flags))
+ netif_carrier_on(adapter->netdev);
+
mod_timer(&adapter->timer_service, round_jiffies(jiffies + HZ));
dev_err(&pdev->dev,
"Device reset completed successfully, Driver info: %s\n",
--
2.17.1
^ permalink raw reply related
* Re: [PATCH net V2] net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
From: Eric Dumazet @ 2019-02-11 17:24 UTC (permalink / raw)
To: Tariq Toukan, David S. Miller
Cc: netdev, Eran Ben Elisha, Saeed Mahameed, Eric Dumazet
In-Reply-To: <1549901057-2614-1-git-send-email-tariqt@mellanox.com>
On 02/11/2019 08:04 AM, Tariq Toukan wrote:
> From: Saeed Mahameed <saeedm@mellanox.com>
>
> When an ethernet frame is padded to meet the minimum ethernet frame
> size, the padding octets are not covered by the hardware checksum.
> Fortunately the padding octets are usually zero's, which don't affect
> checksum. However, it is not guaranteed. For example, switches might
> choose to make other use of these octets.
> This repeatedly causes kernel hardware checksum fault.
>
> Prior to the cited commit below, skb checksum was forced to be
> CHECKSUM_NONE when padding is detected. After it, we need to keep
> skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
> verify and parse IP headers, it does not worth the effort as the packets
> are so small that CHECKSUM_COMPLETE has no significant advantage.
>
> Future work: when reporting checksum complete is not an option for
> IP non-TCP/UDP packets, we can actually fallback to report checksum
> unnecessary, by looking at cqe IPOK bit.
>
> Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Thanks Tariq and Saeed
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: Possible bug into DSA2 code.
From: Florian Fainelli @ 2019-02-11 17:28 UTC (permalink / raw)
To: Rodolfo Giometti, Andrew Lunn; +Cc: Vivien Didelot, David S. Miller, netdev
In-Reply-To: <084b3973-88b9-8c71-50d5-9d773999ad04@enneenne.com>
On 2/10/19 3:45 AM, Rodolfo Giometti wrote:
> On 09/02/2019 20:34, Andrew Lunn wrote:
>>> So we I see two possible solutions:
>>>
>>> 1) having both ds->slave_mii_bus and ds->ops->phy_read already
>>> defined is an
>>> error, then it must be signaled to the calling code, or
>>
>> I don't think we can do that. mv88e6xxx optionally instantiates the
>> MDIO busses, depending on what is in device tree. If there is no mdio
>> property, we need the DSA core to create an MDIO bus.
>
> OK, but using the following check to know if DSA did such allocation is
> not correct because DSA drivers can allocate it by their own:
>
> static void dsa_switch_teardown(struct dsa_switch *ds)
> {
> if (ds->slave_mii_bus && ds->ops->phy_read)
> mdiobus_unregister(ds->slave_mii_bus);
>
> Maybe can we add a flag to register ds->slave_mii_bus allocation by DSA?
If drivers allocate the slave_mii_bus, or use it as a pointer to their
bus, then they should not be providing a ds->ops->phy_read() callback
since we assume they would have mii_bus::read and mii_bus::write set to
their driver internal version.
>
>> Looking at the driver, ds->slave_mii_bus is assigned in
>> mv88e6xxx_setup().
>>
>> We have talked about adding a teardown() to the ops structure. This
>> seems like another argument we should do it. The mv88e6xxx_teardown()
>> can set ds->slave_mii_bus back to NULL, undoing what it did in the
>> setup code.
>
> This seems reasonable to me, but in this case you have to call
> teardown() operation before calling mdiobus_unregister() into
> dsa_switch_teardown() or we still have the problem...
>
> Ciao,
>
> Rodolfo
>
--
Florian
^ permalink raw reply
* Re: [PATCH net-next v3 0/9] net: Remove switchdev_ops
From: Florian Fainelli @ 2019-02-11 17:31 UTC (permalink / raw)
To: netdev; +Cc: idosch, linux-kernel, devel, bridge, jiri, andrew, vivien.didelot
In-Reply-To: <20190210234007.16173-1-f.fainelli@gmail.com>
On 2/10/19 3:39 PM, Florian Fainelli wrote:
> Hi all,
>
> This patch series finishes by the removal of switchdev_ops. To get there
> we need to do a few things:
>
> - get rid of the one and only call to switchdev_port_attr_get() which is
> used to fetch the device's bridge port flags capabilities, instead we
> can just check what flags are being programmed during the prepare
> phase
>
> - once we get rid of getting switchdev port attributes we convert the
> setting of such attributes using a blocking notifier
>
> And then remove switchdev_ops completely.
>
> Please review and let me know what you think!
>
> David, I would like to get Ido's feedback on this to make sure I did not
> miss something, thank you!
David, I will be reposting a v4 with Jiri's Acked-by as well as adding
fallthrough switch/case annotations so we don't regress on that front.
Thank you.
>
> Changes in v3:
> - dropped patches removing te need to get the attribute since we
> still need that in order to support different sleeping vs.
> non-sleeping contexts
>
> Changes in v2:
> - fixed bisectability issues in patch #15
> - added Acked-by from Jiri where necessary
> - fixed a few minor issues according to Jiri's feedback:
> - rename port_attr_event -> port_attr_set_event
> - moved SWITCHDEV_PORT_ATTR_SET closer to other blocking events
>
> Florian Fainelli (9):
> Documentation: networking: switchdev: Update port parent ID section
> switchdev: Add SWITCHDEV_PORT_ATTR_SET, SWITCHDEV_PORT_ATTR_GET
> rocker: Handle SWITCHDEV_PORT_ATTR_GET/SET
> mlxsw: spectrum_switchdev: Handle SWITCHDEV_PORT_ATTR_GET/SET
> net: mscc: ocelot: Handle SWITCHDEV_PORT_ATTR_GET/SET
> staging: fsl-dpaa2: ethsw: Handle SWITCHDEV_PORT_ATTR_GET/SET
> net: dsa: Handle SWITCHDEV_PORT_ATTR_GET/SET
> net: switchdev: Replace port attr get/set SDO with a notification
> net: Remove switchdev_ops
>
> Documentation/networking/switchdev.txt | 10 +-
> .../net/ethernet/mellanox/mlxsw/spectrum.c | 12 --
> .../net/ethernet/mellanox/mlxsw/spectrum.h | 2 -
> .../mellanox/mlxsw/spectrum_switchdev.c | 36 +++---
> drivers/net/ethernet/mscc/ocelot.c | 26 ++++-
> drivers/net/ethernet/rocker/rocker_main.c | 30 ++++-
> drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 30 ++++-
> include/linux/netdevice.h | 3 -
> include/net/switchdev.h | 28 ++---
> net/dsa/slave.c | 30 ++++-
> net/switchdev/switchdev.c | 107 ++++++------------
> 11 files changed, 168 insertions(+), 146 deletions(-)
>
--
Florian
^ permalink raw reply
* Re: Possible bug into DSA2 code.
From: Rodolfo Giometti @ 2019-02-11 17:51 UTC (permalink / raw)
To: Florian Fainelli, Andrew Lunn; +Cc: Vivien Didelot, David S. Miller, netdev
In-Reply-To: <fe3891e6-4a91-f345-f543-60e19b006be3@gmail.com>
On 11/02/2019 18:28, Florian Fainelli wrote:
> On 2/10/19 3:45 AM, Rodolfo Giometti wrote:
>> On 09/02/2019 20:34, Andrew Lunn wrote:
>>>> So we I see two possible solutions:
>>>>
>>>> 1) having both ds->slave_mii_bus and ds->ops->phy_read already
>>>> defined is an
>>>> error, then it must be signaled to the calling code, or
>>>
>>> I don't think we can do that. mv88e6xxx optionally instantiates the
>>> MDIO busses, depending on what is in device tree. If there is no mdio
>>> property, we need the DSA core to create an MDIO bus.
>>
>> OK, but using the following check to know if DSA did such allocation is
>> not correct because DSA drivers can allocate it by their own:
>>
>> static void dsa_switch_teardown(struct dsa_switch *ds)
>> {
>> if (ds->slave_mii_bus && ds->ops->phy_read)
>> mdiobus_unregister(ds->slave_mii_bus);
>>
>> Maybe can we add a flag to register ds->slave_mii_bus allocation by DSA?
>
> If drivers allocate the slave_mii_bus, or use it as a pointer to their
> bus, then they should not be providing a ds->ops->phy_read() callback
> since we assume they would have mii_bus::read and mii_bus::write set to
> their driver internal version.
I see, so having ds->slave_mii_bus and ds->ops->phy_read both not NULL into
dsa_switch_setup() is a potential bug, I suppose... If so why not adding a
BUG_ON() call to signal it instead of doing nothing? :-o
Ciao,
Rodolfo
--
GNU/Linux Solutions e-mail: giometti@enneenne.com
Linux Device Driver giometti@linux.it
Embedded Systems phone: +39 349 2432127
UNIX programming skype: rodolfo.giometti
^ permalink raw reply
* Re: Possible bug into DSA2 code.
From: Florian Fainelli @ 2019-02-11 18:01 UTC (permalink / raw)
To: Rodolfo Giometti, Andrew Lunn; +Cc: Vivien Didelot, David S. Miller, netdev
In-Reply-To: <49728829-49cc-6d6a-04b1-ccfc1e5266dd@enneenne.com>
On 2/11/19 9:51 AM, Rodolfo Giometti wrote:
> On 11/02/2019 18:28, Florian Fainelli wrote:
>> On 2/10/19 3:45 AM, Rodolfo Giometti wrote:
>>> On 09/02/2019 20:34, Andrew Lunn wrote:
>>>>> So we I see two possible solutions:
>>>>>
>>>>> 1) having both ds->slave_mii_bus and ds->ops->phy_read already
>>>>> defined is an
>>>>> error, then it must be signaled to the calling code, or
>>>>
>>>> I don't think we can do that. mv88e6xxx optionally instantiates the
>>>> MDIO busses, depending on what is in device tree. If there is no mdio
>>>> property, we need the DSA core to create an MDIO bus.
>>>
>>> OK, but using the following check to know if DSA did such allocation is
>>> not correct because DSA drivers can allocate it by their own:
>>>
>>> static void dsa_switch_teardown(struct dsa_switch *ds)
>>> {
>>> if (ds->slave_mii_bus && ds->ops->phy_read)
>>> mdiobus_unregister(ds->slave_mii_bus);
>>>
>>> Maybe can we add a flag to register ds->slave_mii_bus allocation by DSA?
>>
>> If drivers allocate the slave_mii_bus, or use it as a pointer to their
>> bus, then they should not be providing a ds->ops->phy_read() callback
>> since we assume they would have mii_bus::read and mii_bus::write set to
>> their driver internal version.
>
> I see, so having ds->slave_mii_bus and ds->ops->phy_read both not NULL
> into dsa_switch_setup() is a potential bug, I suppose... If so why not
> adding a BUG_ON() call to signal it instead of doing nothing? :-o
If you have both non NULL, then your driver did allocate
ds->slave_mii_bus on its own, and also assigned a valid
ds->ops->phy_read() then things will work, except that
ds->ops->phy_read() will not be used. And yes, that is going to be
blowing away when the whole DSA tree gets teardowned.
If you want to add a check for that condition, that would be a good
thing, just not a BUG_ON(), propagate an error back to the caller and
abort the tree/switch probing.
--
Florian
^ permalink raw reply
* [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
From: John David Anglin @ 2019-02-11 18:40 UTC (permalink / raw)
To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <824d011b-3692-69c3-5e2c-58e950a80abf@bell.net>
The GPIO interrupt controller on the espressobin board only supports edge interrupts.
If one enables the use of hardware interrupts in the device tree for the 88E6341, it is
possible to miss an edge. When this happens, the INTn pin on the Marvell switch is
stuck low and no further interrupts occur.
I found after adding debug statements to mv88e6xxx_g1_irq_thread_work() that there is
a race in handling device interrupts (e.g. PHY link interrupts). Some interrupts are
directly cleared by reading the Global 1 status register. However, the device interrupt
flag, for example, is not cleared until all the unmasked SERDES and PHY ports are serviced.
This is done by reading the relevant SERDES and PHY status register.
The code only services interrupts whose status bit is set at the time of reading its status
register. If an interrupt event occurs after its status is read and before all interrupts
are serviced, then this event will not be serviced and the INTn output pin will remain low.
This is not a problem with polling or level interrupts since the handler will be called
again to process the event. However, it's a big problem when using level interrupts.
The fix presented here is to add a loop around the code servicing switch interrupts. If
any pending interrupts remain after the current set has been handled, we loop and process
the new set. If there are no pending interrupts after servicing, we are sure that INTn has
gone high and we will get an edge when a new event occurs.
Tested on espressobin board.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
---
drivers/net/dsa/mv88e6xxx/chip.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 8dca2c949e73..12fd7ce3f1ff 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -261,6 +261,7 @@ static irqreturn_t mv88e6xxx_g1_irq_thread_work(struct mv88e6xxx_chip *chip)
unsigned int sub_irq;
unsigned int n;
u16 reg;
+ u16 ctl1;
int err;
mutex_lock(&chip->reg_lock);
@@ -270,13 +271,28 @@ static irqreturn_t mv88e6xxx_g1_irq_thread_work(struct mv88e6xxx_chip *chip)
if (err)
goto out;
- for (n = 0; n < chip->g1_irq.nirqs; ++n) {
- if (reg & (1 << n)) {
- sub_irq = irq_find_mapping(chip->g1_irq.domain, n);
- handle_nested_irq(sub_irq);
- ++nhandled;
+ do {
+ for (n = 0; n < chip->g1_irq.nirqs; ++n) {
+ if (reg & (1 << n)) {
+ sub_irq = irq_find_mapping(chip->g1_irq.domain,
+ n);
+ handle_nested_irq(sub_irq);
+ ++nhandled;
+ }
}
- }
+
+ mutex_lock(&chip->reg_lock);
+ err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &ctl1);
+ if (err)
+ goto unlock;
+ err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, ®);
+unlock:
+ mutex_unlock(&chip->reg_lock);
+ if (err)
+ goto out;
+ ctl1 &= GENMASK(chip->g1_irq.nirqs, 0);
+ } while (reg & ctl1);
+
out:
return (nhandled > 0 ? IRQ_HANDLED : IRQ_NONE);
}
--
2.11.0
^ permalink raw reply related
* Re: [PATCH 0/2] Netfilter fixes for net
From: David Miller @ 2019-02-11 18:43 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <20190211165319.17965-1-pablo@netfilter.org>
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Mon, 11 Feb 2019 17:53:17 +0100
> The following patchset contains Netfilter fixes for net:
>
> 1) Out-of-bound access to packet data from the snmp nat helper,
> from Jann Horn.
>
> 2) ICMP(v6) error packets are set as related traffic by conntrack,
> update protocol number before calling nf_nat_ipv4_manip_pkt()
> to use ICMP(v6) rather than the original protocol number,
> from Florian Westphal.
>
> You can pull these changes from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git
Pulled, thanks Pablo.
^ permalink raw reply
* Re: [PATCH net-next v3 0/9] net: Remove switchdev_ops
From: David Miller @ 2019-02-11 18:46 UTC (permalink / raw)
To: f.fainelli
Cc: netdev, idosch, linux-kernel, devel, bridge, jiri, andrew,
vivien.didelot
In-Reply-To: <7ece5de8-1bab-0f0e-e7db-ffeef12ba1d7@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Mon, 11 Feb 2019 09:31:42 -0800
> David, I will be reposting a v4 with Jiri's Acked-by as well as adding
> fallthrough switch/case annotations so we don't regress on that front.
> Thank you.
Ok, thanks for letting me know.
^ permalink raw reply
* Re: [PATCH 0/9] perf annotation of BPF programs
From: Arnaldo Carvalho de Melo @ 2019-02-11 18:54 UTC (permalink / raw)
To: Song Liu
Cc: netdev, linux-kernel, ast, daniel, kernel-team, peterz, Jiri Olsa,
Namhyung Kim, Andi Kleen, Stephane Eranian,
Arnaldo Carvalho de Melo
In-Reply-To: <20190209011705.2160185-1-songliubraving@fb.com>
Em Fri, Feb 08, 2019 at 05:16:56PM -0800, Song Liu escreveu:
> This series enables annotation of BPF programs in perf.
>
> perf tool gathers information via sys_bpf and (optionally) stores them in
> perf.data as headers.
Jiri, Stephane, this is the patchkit I mentioned in the context of doing
away with perf.data headers and instead package everything as userspace
PERF_RECORD_ metadata events.
Song, please add Jiri and Namhyung in future perf patchkits, they are
listed as perf tools reviewers in MAINTAINANERS and Jiri also is working
on something directly related.
Thanks,
- Arnaldo
> Patch 1/9 fixes a minor issue in kernel;
> Patch 2/9 to 4/9 introduce new helper functions and use them in perf and
> bpftool;
> Patch 5/9 and 6/9 saves information of bpf program in perf_env;
> Patch 7/9 adds --bpf-event options to perf-top;
> Patch 8/9 enables annotation of bpf programs based on information gathered
> in 5/9 and 6/9;
> Patch 9/9 handles information of short living BPF program that are loaded
> during perf-record or perf-top.
>
> Commands tested during developments are perf-top, perf-record, perf-report,
> and perf-annotate.
>
> ===================== Note on patch dependency ========================
> This set has dependency in both bpf-next tree and tip/perf/core. Current
> version is developed on bpf-next tree with the following commits
> cherry-picked from tip/perf/core:
>
> (from 1/10 to 10/10)
> commit 76193a94522f ("perf, bpf: Introduce PERF_RECORD_KSYMBOL")
> commit d764ac646491 ("tools headers uapi: Sync tools/include/uapi/linux/perf_event.h")
> commit 6ee52e2a3fe4 ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT")
> commit df063c83aa2c ("tools headers uapi: Sync tools/include/uapi/linux/perf_event.h")
> commit 9aa0bfa370b2 ("perf tools: Handle PERF_RECORD_KSYMBOL")
> commit 45178a928a4b ("perf tools: Handle PERF_RECORD_BPF_EVENT")
> commit 7b612e291a5a ("perf tools: Synthesize PERF_RECORD_* for loaded BPF programs")
> commit a40b95bcd30c ("perf top: Synthesize BPF events for pre-existing loaded BPF programs")
> commit 6934058d9fb6 ("bpf: Add module name [bpf] to ksymbols for bpf programs")
> commit 811184fb6977 ("perf bpf: Fix synthesized PERF_RECORD_KSYMBOL/BPF_EVENT")
> ========================================================================
>
> Song Liu (9):
> perf, bpf: consider events with attr.bpf_event as side-band events
> bpf: libbpf: introduce bpf_program__get_prog_info_linear()
> bpf: bpftool: use bpf_program__get_prog_info_linear() in
> prog.c:do_dump()
> perf, bpf: synthesize bpf events with
> bpf_program__get_prog_info_linear()
> perf, bpf: save bpf_prog_info in a rbtree in perf_env
> perf, bpf: save btf in a rbtree in perf_env
> perf-top: add option --bpf-event
> perf, bpf: enable annotation of bpf program
> perf, bpf: save information about short living bpf programs
>
> kernel/events/core.c | 3 +-
> tools/bpf/bpftool/prog.c | 266 ++++++---------------------
> tools/lib/bpf/libbpf.c | 251 ++++++++++++++++++++++++++
> tools/lib/bpf/libbpf.h | 63 +++++++
> tools/lib/bpf/libbpf.map | 3 +
> tools/perf/Makefile.config | 2 +-
> tools/perf/builtin-record.c | 15 +-
> tools/perf/builtin-top.c | 15 +-
> tools/perf/util/annotate.c | 149 ++++++++++++++-
> tools/perf/util/bpf-event.c | 351 +++++++++++++++++++++++++++---------
> tools/perf/util/bpf-event.h | 48 ++++-
> tools/perf/util/dso.c | 1 +
> tools/perf/util/dso.h | 33 ++--
> tools/perf/util/env.c | 148 +++++++++++++++
> tools/perf/util/env.h | 12 ++
> tools/perf/util/evlist.c | 20 ++
> tools/perf/util/evlist.h | 2 +
> tools/perf/util/header.c | 231 +++++++++++++++++++++++-
> tools/perf/util/header.h | 2 +
> tools/perf/util/symbol.c | 1 +
> 20 files changed, 1304 insertions(+), 312 deletions(-)
>
> --
> 2.17.1
^ permalink raw reply
* Re: [PATCH] mt76: change the retun type of mt76_dma_attach()
From: Felix Fietkau @ 2019-02-11 19:00 UTC (permalink / raw)
To: Ryder Lee, Lorenzo Bianconi, Kalle Valo
Cc: Roy Luo, linux-wireless, linux-kernel, netdev, linux-mediatek
In-Reply-To: <228fdddb9ca96e8ce861e324eb9039722cf18f49.1549850911.git.ryder.lee@mediatek.com>
On 2019-02-11 03:13, Ryder Lee wrote:
> There is no need to retun 0 in mt76_dma_attach(), so switch it to void.
>
> Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Applied, thanks.
- Felix
^ permalink raw reply
* Re: [PATCH net-next v2 01/12] net: bridge: multicast: Propagate br_mc_disabled_update() return
From: Florian Fainelli @ 2019-02-11 19:05 UTC (permalink / raw)
To: Ido Schimmel
Cc: netdev@vger.kernel.org, andrew@lunn.ch, vivien.didelot@gmail.com,
davem@davemloft.net, Jiri Pirko, ilias.apalodimas@linaro.org,
ivan.khoronzhuk@linaro.org, roopa@cumulusnetworks.com,
nikolay@cumulusnetworks.com, Petr Machata
In-Reply-To: <20190202154750.GA2864@splinter>
On 2/2/19 7:47 AM, Ido Schimmel wrote:
> On Thu, Jan 31, 2019 at 05:19:25PM -0800, Florian Fainelli wrote:
>> On 1/30/19 11:50 PM, Ido Schimmel wrote:
>>> On Wed, Jan 30, 2019 at 05:00:57PM -0800, Florian Fainelli wrote:
>>>> On 1/29/19 11:36 PM, Ido Schimmel wrote:
>>>>> On Tue, Jan 29, 2019 at 04:55:37PM -0800, Florian Fainelli wrote:
>>>>>> -static void br_mc_disabled_update(struct net_device *dev, bool value)
>>>>>> +static int br_mc_disabled_update(struct net_device *dev, bool value)
>>>>>> {
>>>>>> struct switchdev_attr attr = {
>>>>>> .orig_dev = dev,
>>>>>> .id = SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED,
>>>>>> - .flags = SWITCHDEV_F_DEFER,
>>>>>> + .flags = SWITCHDEV_F_DEFER | SWITCHDEV_F_SKIP_EOPNOTSUPP,
>>>>>
>>>>> Actually, since the operation is deferred I don't think the return value
>>>>> from the driver is ever checked. Can you test it?
>>>>
>>>> You are right, you get a WARN() from switchdev_attr_port_set_now(), but
>>>> this does not propagate back to the caller, so you can still create a
>>>> bridge device and enslave a device successfully despite getting warnings
>>>> on the console.
>>>>
>>>>>
>>>>> I think it would be good to convert the attributes to use the switchdev
>>>>> notifier like commit d17d9f5e5143 ("switchdev: Replace port obj add/del
>>>>> SDO with a notification") did for objects. Then you can have your
>>>>> listener veto the operation in the same context it is happening.
>>>>
>>>> Alright, working on it. Would you do that just for the attr_set, or for
>>>> attr_get as well (to be symmetrical)?
>>>
>>> Yes, then we can get rid of switchdev_ops completely.
>>>
>>
>> OK, so here is what I have so far:
>>
>> https://github.com/ffainelli/linux/pull/new/switchdev-attr
>>
>> although I am seeing some invalid context operations with DSA that I am
>> debugging. Does this look like the right way to go from your perspective?
>
> That was quick :)
>
> I think I owe you some explanation as to why I even came up with the
> idea. But before that, I have another idea how to solve your immediate
> problem.
>
> You can employ a similar trick to the one used to set bridge port flags
> in br_switchdev_set_port_flag(). Like br_mc_disabled_update() it is also
> called from an atomic context, so in order to allow drivers to veto
> unsupported flags we introduced a new switchdev attribute:
> 'SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT'
Yes that is a great idea, for some reason I completely missed your email
here, but this is how I am going to approach it now.
Another way to look at the problem could be to question whether we
really need to be in atomic context when such attributes are pushed?
AFAICT, we are executing with BH disabled because we want to protect the
bridge master's bridge port list, right?
>
> The attribute is only used in get operations and never deferred. You can
> introduce 'SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED_SUPPORT' and use it
> before invoking 'SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED'.
>
> Regarding the whole notifier business and switchdev in general. Using
> the device chain to propagate various bridge port attributes is wrong.
> We saw it multiple times in the past already. First with routes that
> were converted to use notifiers because the switch needs to offload the
> entire routing table and not only routes whose nexthop device is a
> switch port. Then with FDB entries and recently also with VLANs in a
> VLAN-aware bridge. See merge commit 02e1dbe402de ("Merge branch
> 'Pass-extack-to-SWITCHDEV_PORT_OBJ_ADD'") for more details.
>
> This is also true for switchdev attributes since we might want to
> support toggling of bridge port flags on a VXLAN device in the future.
> For example, to allow selective flooding. Others might have more use
> cases.
>
> I want to use the opportunity to pick your brain (and others') about
> more issues I see with switchdev and maybe reach some agreement and form
> a plan. Just to be clear, it is not related to your patchset.
>
> The prepare-commit model probably made sense in the beginning, but a few
> years later I think we know better. At least in mlxsw we have multiple
> places where we perform all the work in the prepare phase simply because
> that without doing all the work we don't have a way to guarantee that
> the commit phase will succeed. I'm not aware of other instances of this
> model in the networking code, so I wonder why we need it in switchdev
> and if we can simply remove it and simplify things.
If we can at least re-purpose the prepare + commit model such that the
prepare phase is: can you support that operation (without allocating
resources) and do that operation in the caller context, while the commit
is deferred, then that would solve some of my issues but it could create
more for mlxsw possibly?
I have to wonder though, if we have a driver which is constructed such that:
- all HW programming is done
- a SW resource (e.g.: a rule object) is allocated last, then if the
allocation fails, we should be able to easily rollback the HW
programming we just did
Would that work?
>
> Another issue is that I believe we can completely remove the switchdev
> infrastructure as it basically boils down to bridge-specific notifiers.
> If you look at where switchdev is actually used in the networking stack
> while excluding the obvious suspects you get this:
>
> $ git grep -i -n -e 'switchdev' -- 'net/*' ':!net/bridge/*' ':!net/dsa/*' ':!net/switchdev/'
> net/8021q/vlan_dev.c:34:#include <net/switchdev.h>
> net/Kconfig:240:source "net/switchdev/Kconfig"
> net/Makefile:81:ifneq ($(CONFIG_NET_SWITCHDEV),)
> net/Makefile:82:obj-y += switchdev/
> net/core/net-sysfs.c:15:#include <net/switchdev.h>
> net/core/net-sysfs.c:504: struct switchdev_attr attr = {
> net/core/net-sysfs.c:506: .id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
> net/core/net-sysfs.c:507: .flags = SWITCHDEV_F_NO_RECURSE,
> net/core/net-sysfs.c:510: ret = switchdev_port_attr_get(netdev, &attr);
> net/core/rtnetlink.c:49:#include <net/switchdev.h>
> net/core/rtnetlink.c:1150: struct switchdev_attr attr = {
> net/core/rtnetlink.c:1152: .id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
> net/core/rtnetlink.c:1153: .flags = SWITCHDEV_F_NO_RECURSE,
> net/core/rtnetlink.c:1156: err = switchdev_port_attr_get(dev, &attr);
> net/core/skbuff.c:4925:#ifdef CONFIG_NET_SWITCHDEV
> net/ipv4/ip_forward.c:72:#ifdef CONFIG_NET_SWITCHDEV
> net/ipv4/ipmr.c:70:#include <net/switchdev.h>
> net/ipv4/ipmr.c:841: struct switchdev_attr attr = {
> net/ipv4/ipmr.c:842: .id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
> net/ipv4/ipmr.c:923: if (!switchdev_port_attr_get(dev, &attr)) {
> net/ipv4/ipmr.c:1802:#ifdef CONFIG_NET_SWITCHDEV
> net/ipv6/ip6_output.c:381:#ifdef CONFIG_NET_SWITCHDEV
>
> These are all instances related to the use of parent ID. Why not add a
> new ndo that will allow various users to query the parent ID of a
> netdev? bond and team can return an error if their slaves don't all have
> the same parent ID.
This should now be addressed as you might have seen.
>
> You then end up with basic constructs like notifiers and ndo invocations
> and not with complex objects and attributes that are propagated via the
> device chain with a prepare-commit model.
>
> WDYT?
>
>> --
>> Florian
--
Florian
^ permalink raw reply
* [PATCH net-next v4 0/9] net: Remove switchdev_ops
From: Florian Fainelli @ 2019-02-11 19:09 UTC (permalink / raw)
To: netdev
Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
andrew, vivien.didelot
Hi all,
This patch series finishes by the removal of switchdev_ops. To get there
we convert the existing switchdev_port_attr_{set,get} switchdev_ops to
use a blocking notifier, thus making it consistent with how the objects
are pushed to the switchdev enabled devices.
Please review and let me know what you think!
David, I would like to get Ido's feedback on this to make sure I did not
miss something, thank you!
Changes in v4:
- removed double space in Documentation/networking/switchdev.txt
- added Jiri's Acked-by tags
- added fall through annotations where appropriate
Changes in v3:
- dropped patches removing te need to get the attribute since we
still need that in order to support different sleeping vs.
non-sleeping contexts
Changes in v2:
- fixed bisectability issues in patch #15
- added Acked-by from Jiri where necessary
- fixed a few minor issues according to Jiri's feedback:
- rename port_attr_event -> port_attr_set_event
- moved SWITCHDEV_PORT_ATTR_SET closer to other blocking events
Florian Fainelli (9):
Documentation: networking: switchdev: Update port parent ID section
switchdev: Add SWITCHDEV_PORT_ATTR_SET, SWITCHDEV_PORT_ATTR_GET
rocker: Handle SWITCHDEV_PORT_ATTR_GET/SET
mlxsw: spectrum_switchdev: Handle SWITCHDEV_PORT_ATTR_GET/SET
net: mscc: ocelot: Handle SWITCHDEV_PORT_ATTR_GET/SET
staging: fsl-dpaa2: ethsw: Handle SWITCHDEV_PORT_ATTR_GET/SET
net: dsa: Handle SWITCHDEV_PORT_ATTR_GET/SET
net: switchdev: Replace port attr get/set SDO with a notification
net: Remove switchdev_ops
Documentation/networking/switchdev.txt | 10 +-
.../net/ethernet/mellanox/mlxsw/spectrum.c | 12 --
.../net/ethernet/mellanox/mlxsw/spectrum.h | 2 -
.../mellanox/mlxsw/spectrum_switchdev.c | 36 +++---
drivers/net/ethernet/mscc/ocelot.c | 26 ++++-
drivers/net/ethernet/rocker/rocker_main.c | 30 ++++-
drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 30 ++++-
include/linux/netdevice.h | 3 -
include/net/switchdev.h | 28 ++---
net/dsa/slave.c | 30 ++++-
net/switchdev/switchdev.c | 107 ++++++------------
11 files changed, 168 insertions(+), 146 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH net-next v4 1/9] Documentation: networking: switchdev: Update port parent ID section
From: Florian Fainelli @ 2019-02-11 19:09 UTC (permalink / raw)
To: netdev
Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
andrew, vivien.didelot
In-Reply-To: <20190211191001.8623-1-f.fainelli@gmail.com>
Update the section about switchdev drivers having to implement a
switchdev_port_attr_get() function to return
SWITCHDEV_ATTR_ID_PORT_PARENT_ID since that is no longer valid after
commit bccb30254a4a ("net: Get rid of
SWITCHDEV_ATTR_ID_PORT_PARENT_ID").
Fixes: bccb30254a4a ("net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID")
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Documentation/networking/switchdev.txt | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index f3244d87512a..ea90243340a9 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -92,11 +92,11 @@ device.
Switch ID
^^^^^^^^^
-The switchdev driver must implement the switchdev op switchdev_port_attr_get
-for SWITCHDEV_ATTR_ID_PORT_PARENT_ID for each port netdev, returning the same
-physical ID for each port of a switch. The ID must be unique between switches
-on the same system. The ID does not need to be unique between switches on
-different systems.
+The switchdev driver must implement the net_device operation
+ndo_get_port_parent_id for each port netdev, returning the same physical ID for
+each port of a switch. The ID must be unique between switches on the same
+system. The ID does not need to be unique between switches on different
+systems.
The switch ID is used to locate ports on a switch and to know if aggregated
ports belong to the same switch.
--
2.17.1
^ permalink raw reply related
* [PATCH net-next v4 2/9] switchdev: Add SWITCHDEV_PORT_ATTR_SET, SWITCHDEV_PORT_ATTR_GET
From: Florian Fainelli @ 2019-02-11 19:09 UTC (permalink / raw)
To: netdev
Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
andrew, vivien.didelot
In-Reply-To: <20190211191001.8623-1-f.fainelli@gmail.com>
In preparation for allowing switchdev enabled drivers to veto specific
attribute settings from within the context of the caller, introduce a
new switchdev notifier type for port attributes.
Suggested-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
include/net/switchdev.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 5e87b54c5dc5..b8becabbef38 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -143,6 +143,9 @@ enum switchdev_notifier_type {
SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE,
SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE,
SWITCHDEV_VXLAN_FDB_OFFLOADED,
+
+ SWITCHDEV_PORT_ATTR_SET, /* Blocking. */
+ SWITCHDEV_PORT_ATTR_GET, /* Blocking. */
};
struct switchdev_notifier_info {
@@ -165,6 +168,13 @@ struct switchdev_notifier_port_obj_info {
bool handled;
};
+struct switchdev_notifier_port_attr_info {
+ struct switchdev_notifier_info info; /* must be first */
+ struct switchdev_attr *attr;
+ struct switchdev_trans *trans;
+ bool handled;
+};
+
static inline struct net_device *
switchdev_notifier_info_to_dev(const struct switchdev_notifier_info *info)
{
--
2.17.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox