* Re: [PATCH net-next 2/7] net: rtnetlink: allow only one idx saving stats attribute
From: Roopa Prabhu @ 2016-04-28 5:18 UTC (permalink / raw)
To: Nikolay Aleksandrov; +Cc: netdev, davem, stephen, jhs
In-Reply-To: <1461773902-13528-3-git-send-email-nikolay@cumulusnetworks.com>
On 4/27/16, 9:18 AM, Nikolay Aleksandrov wrote:
> We can't allow more than one stats attribute which uses the local idx
> since the result will be a mess. This is a simple check to make sure
> only one is being used at a time. Later when the filter_mask's 32 bits
> are over we can switch to a bitmap.
>
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---
> include/net/rtnetlink.h | 6 ++++++
> net/core/rtnetlink.c | 17 +++++++++++++++--
> 2 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
> index 2f87c1ba13de..3f3b0b1b8722 100644
> --- a/include/net/rtnetlink.h
> +++ b/include/net/rtnetlink.h
> @@ -150,4 +150,10 @@ int rtnl_nla_parse_ifla(struct nlattr **tb, const struct nlattr *head, int len);
>
> #define MODULE_ALIAS_RTNL_LINK(kind) MODULE_ALIAS("rtnl-link-" kind)
>
> +/* at most one attribute which can save a local idx is allowed to be set
> + * IFLA_STATS_IDX_ATTR_MASK has all the idx saving attributes set and is
> + * used to check if more than one is being requested
> + */
> +#define IFLA_STATS_IDX_ATTR_MASK 0
> +
> #endif
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index aeb2fa9b1cda..ea03b6cd3d3c 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -3512,7 +3512,7 @@ static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh)
> struct if_stats_msg *ifsm;
> struct net_device *dev = NULL;
> struct sk_buff *nskb;
> - u32 filter_mask;
> + u32 filter_mask, lidx_filter;
> int lidx = 0;
> int err;
>
> @@ -3529,6 +3529,14 @@ static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh)
> if (!filter_mask)
> return -EINVAL;
>
> + /* only one attribute which can save a local idx is allowed at a time
> + * even though rtnl_stats_get doesn't save the lidx, we need to be
> + * consistent with the dump side and error out
> + */
> + lidx_filter = filter_mask & IFLA_STATS_IDX_ATTR_MASK;
> + if (lidx_filter && !is_power_of_2(lidx_filter))
> + return -EINVAL;
> +
> nskb = nlmsg_new(if_nlmsg_stats_size(dev, filter_mask), GFP_KERNEL);
> if (!nskb)
> return -ENOBUFS;
> @@ -3556,7 +3564,7 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
> struct net_device *dev;
> struct hlist_head *head;
> unsigned int flags = NLM_F_MULTI;
> - u32 filter_mask = 0;
> + u32 filter_mask = 0, lidx_filter;
> int err;
>
> s_h = cb->args[0];
> @@ -3570,6 +3578,11 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb)
> if (!filter_mask)
> return -EINVAL;
>
> + /* only one attribute which can save a local idx is allowed at a time */
> + lidx_filter = filter_mask & IFLA_STATS_IDX_ATTR_MASK;
> + if (lidx_filter && !is_power_of_2(lidx_filter))
> + return -EINVAL;
> +
>
instead of introducing the restriction at this level, is it possible to use two args for this
like below and avoid the restriction ?
cb->args[2] = current filter being processed
cb->args[3] = private filter idx (your lidx)
^ permalink raw reply
* Re: iproute2: bash completion function for tc
From: Stephen Hemminger @ 2016-04-28 5:13 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Quentin Monnet, Jamal Hadi Salim, Vincent Jardin, netdev
In-Reply-To: <20160428031923.GA76700@ast-mbp.thefacebook.com>
On Wed, 27 Apr 2016 20:19:26 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Tue, Apr 26, 2016 at 09:28:17AM +0200, Quentin Monnet wrote:
> > Hi Jamal, Stephen,
> >
> > I searched for a function providing auto-completion for `tc` utility in
> > bash, but I found none. So I have created one, and I would like share it
> > with the community. It is available here:
> > https://github.com/6WIND/tc_bash-completion/blob/master/tc
> > I would like to make it easily available to tc users, so here is a
> > twofold request:
> >
> > * I do not know where to submit the code. Should I submit here on netdev
> > for inclusion in iproute2 package, or rather to the bash-completion
> > repository on GitHub? I feel like it would receive better feedback and
> > updates if pushed to iproute2. Could you please provide some advice here?
> > * The completion for `tc` seems to work well; I have tested it with many
> > commands, but I am no tc expert, and there are probably some cases where
> > the completion fails to propose the correct choices. I would be really
> > interested in any feedback/bug reports that you, or anyone on this list
> > who uses tc, could provide.
>
> that looks very interesting.
> I think making it a part of iproute2 is a good thing.
> How about installing it into /etc/iproute2/ ?
> Stephen, any comments?
>
I am ok with keeping it in the repository.
But it would need to be installed in the standard bash directory,
is that distro dependent?
^ permalink raw reply
* Re: [PATCH v2 net-next 2/2] tcp: remove SKBTX_ACK_TSTAMP since it is redundant
From: Eric Dumazet @ 2016-04-28 4:37 UTC (permalink / raw)
To: Soheil Hassas Yeganeh
Cc: davem, netdev, kafai, willemb, edumazet, ycheng, ncardwell,
Soheil Hassas Yeganeh
In-Reply-To: <1461814741-848-3-git-send-email-soheil.kdev@gmail.com>
On Wed, 2016-04-27 at 23:39 -0400, Soheil Hassas Yeganeh wrote:
> From: Soheil Hassas Yeganeh <soheil@google.com>
>
> The SKBTX_ACK_TSTAMP flag is set in skb_shinfo->tx_flags when
> the timestamp of the TCP acknowledgement should be reported on
> error queue. Since accessing skb_shinfo is likely to incur a
> cache-line miss at the time of receiving the ack, the
> txstamp_ack bit was added in tcp_skb_cb, which is set iff
> the SKBTX_ACK_TSTAMP flag is set for an skb. This makes
> SKBTX_ACK_TSTAMP flag redundant.
>
> Remove the SKBTX_ACK_TSTAMP and instead use the txstamp_ack bit
> everywhere.
>
> Note that this frees one bit in shinfo->tx_flags.
>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> Suggested-by: Willem de Bruijn <willemb@google.com>
> ---
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [PATCH v2 net-next 1/2] tcp: remove an unnecessary check in tcp_tx_timestamp
From: Eric Dumazet @ 2016-04-28 4:35 UTC (permalink / raw)
To: Soheil Hassas Yeganeh
Cc: davem, netdev, kafai, willemb, edumazet, ycheng, ncardwell,
Soheil Hassas Yeganeh
In-Reply-To: <1461814741-848-2-git-send-email-soheil.kdev@gmail.com>
On Wed, 2016-04-27 at 23:39 -0400, Soheil Hassas Yeganeh wrote:
> From: Soheil Hassas Yeganeh <soheil@google.com>
>
> Remove the redundant check for sk->sk_tsflags in tcp_tx_timestamp.
>
> tcp_tx_timestamp() receives the tsflags as a parameter. As a
> result the "sk->sk_tsflags || tsflags" is redundant, since
> tsflags already includes sk->sk_tsflags plus overrides from
> control messages.
>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [RFC PATCH 4/5] bnxt: Add support for segmentation of tunnels with outer checksums
From: Michael Chan @ 2016-04-28 4:32 UTC (permalink / raw)
To: Alexander Duyck
Cc: Alexander Duyck, eugenia, Bruce W Allan, Saeed Mahameed, Netdev,
intel-wired-lan, Ariel Elior, Michael Chan
In-Reply-To: <CAKgT0UffVkQWWqgwYZy-AXEq=kofKiRx2D9pnu5nBQD+uSLS+g@mail.gmail.com>
On Wed, Apr 27, 2016 at 8:21 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Tue, Apr 26, 2016 at 10:55 PM, Michael Chan
> <michael.chan@broadcom.com> wrote:
>> On Tue, Apr 19, 2016 at 12:06 PM, Alexander Duyck <aduyck@mirantis.com> wrote:
>>> This patch assumes that the bnxt hardware will ignore existing IPv4/v6
>>> header fields for length and checksum as well as the length and checksum
>>> fields for outer UDP and GRE headers.
>>>
>>> I have no means of testing this as I do not have any bnx2x hardware but
>>> thought I would submit it as an RFC to see if anyone out there wants to
>>> test this and see if this does in fact enable this functionality allowing
>>> us to to segment tunneled frames that have an outer checksum.
>>>
>>> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
>>
>> Hi Alex, I just did a very quick test of this patch on our bnxt
>> hardware and it seemed to work.
>>
>> I created a vxlan endpoint with udpcsum enabled and I saw TSO packets
>> getting through. I've verified that our hardware can be programmed to
>> either ignore outer UDP checksum or to calculate it. Current default
>> is to ignore ipv4 UDP checksum and calculate ipv6 UDP checksum.
>> Thanks.
>
> Are you saying you can natively support UDP tunnel with outer checksum
> offload then?
Yes. Calculate or ignore the outer UDP checksum.
>
> I'm just trying to sort out if you actually need to have the partial
> segmentation offload support or if we can handle it in hardware. Also
> is there any documentation you could point me to that might help to
> clarify what the hardware does/doesn't support so that I could improve
> upon this patch in order to make sure we are getting the most bang for
> the buck in terms of the features that can be offloaded by hardware?
No public documentation yet. I think the plan is to publish the
programmer's reference on our website at some point in the future.
^ permalink raw reply
* [PATCH v2 net-next 2/2] tcp: remove SKBTX_ACK_TSTAMP since it is redundant
From: Soheil Hassas Yeganeh @ 2016-04-28 3:39 UTC (permalink / raw)
To: davem, netdev
Cc: kafai, willemb, edumazet, ycheng, ncardwell,
Soheil Hassas Yeganeh
In-Reply-To: <1461814741-848-1-git-send-email-soheil.kdev@gmail.com>
From: Soheil Hassas Yeganeh <soheil@google.com>
The SKBTX_ACK_TSTAMP flag is set in skb_shinfo->tx_flags when
the timestamp of the TCP acknowledgement should be reported on
error queue. Since accessing skb_shinfo is likely to incur a
cache-line miss at the time of receiving the ack, the
txstamp_ack bit was added in tcp_skb_cb, which is set iff
the SKBTX_ACK_TSTAMP flag is set for an skb. This makes
SKBTX_ACK_TSTAMP flag redundant.
Remove the SKBTX_ACK_TSTAMP and instead use the txstamp_ack bit
everywhere.
Note that this frees one bit in shinfo->tx_flags.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Suggested-by: Willem de Bruijn <willemb@google.com>
---
include/linux/skbuff.h | 6 +-----
net/ipv4/tcp.c | 5 +++--
net/ipv4/tcp_input.c | 3 +--
net/ipv4/tcp_output.c | 17 +++++++++++------
net/socket.c | 3 ---
5 files changed, 16 insertions(+), 18 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index da0ace3..ae30555 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -382,14 +382,10 @@ enum {
/* generate software time stamp when entering packet scheduling */
SKBTX_SCHED_TSTAMP = 1 << 6,
-
- /* generate software timestamp on peer data acknowledgment */
- SKBTX_ACK_TSTAMP = 1 << 7,
};
#define SKBTX_ANY_SW_TSTAMP (SKBTX_SW_TSTAMP | \
- SKBTX_SCHED_TSTAMP | \
- SKBTX_ACK_TSTAMP)
+ SKBTX_SCHED_TSTAMP)
#define SKBTX_ANY_TSTAMP (SKBTX_HW_TSTAMP | SKBTX_ANY_SW_TSTAMP)
/*
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3c542dc..8e05eb6 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -435,9 +435,10 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags, struct sk_buff *skb)
struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
- if (shinfo->tx_flags & SKBTX_ANY_TSTAMP)
+ if (tsflags & SOF_TIMESTAMPING_TX_ACK)
+ tcb->txstamp_ack = 1;
+ if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
- tcb->txstamp_ack = !!(shinfo->tx_flags & SKBTX_ACK_TSTAMP);
}
}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 967520d..2f3fd92 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3087,8 +3087,7 @@ static void tcp_ack_tstamp(struct sock *sk, struct sk_buff *skb,
return;
shinfo = skb_shinfo(skb);
- if ((shinfo->tx_flags & SKBTX_ACK_TSTAMP) &&
- !before(shinfo->tskey, prior_snd_una) &&
+ if (!before(shinfo->tskey, prior_snd_una) &&
before(shinfo->tskey, tcp_sk(sk)->snd_una))
__skb_tstamp_tx(skb, NULL, sk, SCM_TSTAMP_ACK);
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9d3b4b3..ace183c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1111,11 +1111,17 @@ static void tcp_adjust_pcount(struct sock *sk, const struct sk_buff *skb, int de
tcp_verify_left_out(tp);
}
+static bool tcp_has_tx_tstamp(const struct sk_buff *skb)
+{
+ return TCP_SKB_CB(skb)->txstamp_ack ||
+ (skb_shinfo(skb)->tx_flags & SKBTX_ANY_TSTAMP);
+}
+
static void tcp_fragment_tstamp(struct sk_buff *skb, struct sk_buff *skb2)
{
struct skb_shared_info *shinfo = skb_shinfo(skb);
- if (unlikely(shinfo->tx_flags & SKBTX_ANY_TSTAMP) &&
+ if (unlikely(tcp_has_tx_tstamp(skb)) &&
!before(shinfo->tskey, TCP_SKB_CB(skb2)->seq)) {
struct skb_shared_info *shinfo2 = skb_shinfo(skb2);
u8 tsflags = shinfo->tx_flags & SKBTX_ANY_TSTAMP;
@@ -2446,13 +2452,12 @@ u32 __tcp_select_window(struct sock *sk)
void tcp_skb_collapse_tstamp(struct sk_buff *skb,
const struct sk_buff *next_skb)
{
- const struct skb_shared_info *next_shinfo = skb_shinfo(next_skb);
- u8 tsflags = next_shinfo->tx_flags & SKBTX_ANY_TSTAMP;
-
- if (unlikely(tsflags)) {
+ if (unlikely(tcp_has_tx_tstamp(next_skb))) {
+ const struct skb_shared_info *next_shinfo =
+ skb_shinfo(next_skb);
struct skb_shared_info *shinfo = skb_shinfo(skb);
- shinfo->tx_flags |= tsflags;
+ shinfo->tx_flags |= next_shinfo->tx_flags & SKBTX_ANY_TSTAMP;
shinfo->tskey = next_shinfo->tskey;
TCP_SKB_CB(skb)->txstamp_ack |=
TCP_SKB_CB(next_skb)->txstamp_ack;
diff --git a/net/socket.c b/net/socket.c
index 5dbb0bb..7789d79 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -600,9 +600,6 @@ void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags)
if (tsflags & SOF_TIMESTAMPING_TX_SCHED)
flags |= SKBTX_SCHED_TSTAMP;
- if (tsflags & SOF_TIMESTAMPING_TX_ACK)
- flags |= SKBTX_ACK_TSTAMP;
-
*tx_flags = flags;
}
EXPORT_SYMBOL(__sock_tx_timestamp);
--
2.8.0.rc3.226.g39d4020
^ permalink raw reply related
* [PATCH v2 net-next 1/2] tcp: remove an unnecessary check in tcp_tx_timestamp
From: Soheil Hassas Yeganeh @ 2016-04-28 3:39 UTC (permalink / raw)
To: davem, netdev
Cc: kafai, willemb, edumazet, ycheng, ncardwell,
Soheil Hassas Yeganeh
In-Reply-To: <1461814741-848-1-git-send-email-soheil.kdev@gmail.com>
From: Soheil Hassas Yeganeh <soheil@google.com>
Remove the redundant check for sk->sk_tsflags in tcp_tx_timestamp.
tcp_tx_timestamp() receives the tsflags as a parameter. As a
result the "sk->sk_tsflags || tsflags" is redundant, since
tsflags already includes sk->sk_tsflags plus overrides from
control messages.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
net/ipv4/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4d73858..3c542dc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -430,7 +430,7 @@ EXPORT_SYMBOL(tcp_init_sock);
static void tcp_tx_timestamp(struct sock *sk, u16 tsflags, struct sk_buff *skb)
{
- if (sk->sk_tsflags || tsflags) {
+ if (tsflags) {
struct skb_shared_info *shinfo = skb_shinfo(skb);
struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
--
2.8.0.rc3.226.g39d4020
^ permalink raw reply related
* [PATCH v2 net-next 0/2] tcp: simplify ack tx timestamps
From: Soheil Hassas Yeganeh @ 2016-04-28 3:38 UTC (permalink / raw)
To: davem, netdev
Cc: kafai, willemb, edumazet, ycheng, ncardwell,
Soheil Hassas Yeganeh
From: Soheil Hassas Yeganeh <soheil@google.com>
v2:
- Fully remove SKBTX_ACK_TSTAMP, as suggested by Willem de Bruijn.
This patch series aims at removing redundant checks and fields
for ack timestamps for TCP.
Soheil Hassas Yeganeh (2):
tcp: remove an unnecessary check in tcp_tx_timestamp
tcp: remove SKBTX_ACK_TSTAMP since it is redundant
include/linux/skbuff.h | 6 +-----
net/ipv4/tcp.c | 7 ++++---
net/ipv4/tcp_input.c | 3 +--
net/ipv4/tcp_output.c | 17 +++++++++++------
net/socket.c | 3 ---
5 files changed, 17 insertions(+), 19 deletions(-)
--
2.8.0.rc3.226.g39d4020
^ permalink raw reply
* Re: iproute2: bash completion function for tc
From: Alexei Starovoitov @ 2016-04-28 3:19 UTC (permalink / raw)
To: Quentin Monnet
Cc: Jamal Hadi Salim, Stephen Hemminger, Vincent Jardin, netdev
In-Reply-To: <571F1891.1030004@6wind.com>
On Tue, Apr 26, 2016 at 09:28:17AM +0200, Quentin Monnet wrote:
> Hi Jamal, Stephen,
>
> I searched for a function providing auto-completion for `tc` utility in
> bash, but I found none. So I have created one, and I would like share it
> with the community. It is available here:
> https://github.com/6WIND/tc_bash-completion/blob/master/tc
> I would like to make it easily available to tc users, so here is a
> twofold request:
>
> * I do not know where to submit the code. Should I submit here on netdev
> for inclusion in iproute2 package, or rather to the bash-completion
> repository on GitHub? I feel like it would receive better feedback and
> updates if pushed to iproute2. Could you please provide some advice here?
> * The completion for `tc` seems to work well; I have tested it with many
> commands, but I am no tc expert, and there are probably some cases where
> the completion fails to propose the correct choices. I would be really
> interested in any feedback/bug reports that you, or anyone on this list
> who uses tc, could provide.
that looks very interesting.
I think making it a part of iproute2 is a good thing.
How about installing it into /etc/iproute2/ ?
Stephen, any comments?
^ permalink raw reply
* Re: [PATCH net-next 0/2] net: avoid some atomic ops when FASYNC is not used
From: David Miller @ 2016-04-28 3:10 UTC (permalink / raw)
To: edumazet; +Cc: netdev, eric.dumazet
In-Reply-To: <1461605974-4242-1-git-send-email-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Mon, 25 Apr 2016 10:39:31 -0700
> We can avoid some atomic operations on sockets not using FASYNC
I guess a user can do weird things and set/clear the FASYNC bit in the
middle of the SOCKWQ_ASYNC_ bit being set, and reset the FASYNC bit
later and the SOCKWQ_* state is stale.
However, that's probably not worth handling explicitly.
Series applied, thanks.
^ permalink raw reply
* Re: [net-next PATCH V3 0/5] samples/bpf: Improve user experience
From: David Miller @ 2016-04-28 2:55 UTC (permalink / raw)
To: brouer
Cc: netdev, linux-kbuild, bblanco, naveen.n.rao, borkmann,
alexei.starovoitov
In-Reply-To: <20160427072855.29959.70252.stgit@firesoul>
From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Wed, 27 Apr 2016 09:30:08 +0200
> It is a steep learning curve getting started with using the eBPF
> examples in samples/bpf/. There are several dependencies, and
> specific versions of these dependencies. Invoking make in the correct
> manor is also slightly obscure.
>
> This patchset cleanup, document and hopefully improves the first time
> user experience with the eBPF samples directory by auto-detecting
> certain scenarios.
>
> V3:
> - Add Alexei's ACKs
> - Remove README paragraph about LLVM experimental BPF target
> as it only existed between LLVM version 3.6 to 3.7.
>
> V2:
> - Adjusted recommend minimum versions to 3.7.1
> - Included clang build instructions
> - New patch adding CLANG variable and validation of command
Please respin addressing Naveen's feedback, thanks.
^ permalink raw reply
* Re: [PATCH net-next 00/17] net: snmp: update SNMP methods
From: David Miller @ 2016-04-28 2:48 UTC (permalink / raw)
To: edumazet; +Cc: netdev, eric.dumazet
In-Reply-To: <1461800683-25034-1-git-send-email-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 27 Apr 2016 16:44:26 -0700
> In the old days (before linux-3.0), SNMP counters were duplicated,
> one set for user context, and anther one for BH context.
>
> After commit 8f0ea0fe3a03 ("snmp: reduce percpu needs by 50%")
> we have a single copy, and what really matters is preemption being
> enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc()
> respectively.
>
> This patch series kills the obsolete STATS_USER() helpers,
> and rename all XXX_BH() helpers to __XXX() ones, to more
> closely match conventions used to update per cpu variables.
>
> This is probably going to hurt maintainers job for a while,
> since cherry-picks will not be clean, but this had to be
> cleaned at one point. I am so sorry guys.
Looks good to me, series applied, thanks Eric.
^ permalink raw reply
* Re: [net-next v2 00/14][pull request] 40GbE Intel Wired LAN Driver Updates 2016-04-27
From: David Miller @ 2016-04-28 2:17 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene, john.ronciak
In-Reply-To: <1461788153-12869-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 27 Apr 2016 13:15:39 -0700
> This series contains updates to i40e and i40evf.
Pulled, thanks Jeff.
^ permalink raw reply
* [PATCH net 2/3] bpf: fix check_map_func_compatibility logic
From: Alexei Starovoitov @ 2016-04-28 1:56 UTC (permalink / raw)
To: David S . Miller
Cc: Daniel Borkmann, Jann Horn, Linus Torvalds, netdev, kernel-team
In-Reply-To: <1461808582-1452466-1-git-send-email-ast@fb.com>
The commit 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter")
introduced clever way to check bpf_helper<->map_type compatibility.
Later on commit a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") adjusted
the logic and inadvertently broke it.
Get rid of the clever bool compare and go back to two-way check
from map and from helper perspective.
Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
kernel/bpf/verifier.c | 65 +++++++++++++++++++++++++++++++--------------------
1 file changed, 40 insertions(+), 25 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 89bcaa0966da..c5c17a62f509 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -239,16 +239,6 @@ static const char * const reg_type_str[] = {
[CONST_IMM] = "imm",
};
-static const struct {
- int map_type;
- int func_id;
-} func_limit[] = {
- {BPF_MAP_TYPE_PROG_ARRAY, BPF_FUNC_tail_call},
- {BPF_MAP_TYPE_PERF_EVENT_ARRAY, BPF_FUNC_perf_event_read},
- {BPF_MAP_TYPE_PERF_EVENT_ARRAY, BPF_FUNC_perf_event_output},
- {BPF_MAP_TYPE_STACK_TRACE, BPF_FUNC_get_stackid},
-};
-
static void print_verifier_state(struct verifier_env *env)
{
enum bpf_reg_type t;
@@ -921,27 +911,52 @@ static int check_func_arg(struct verifier_env *env, u32 regno,
static int check_map_func_compatibility(struct bpf_map *map, int func_id)
{
- bool bool_map, bool_func;
- int i;
-
if (!map)
return 0;
- for (i = 0; i < ARRAY_SIZE(func_limit); i++) {
- bool_map = (map->map_type == func_limit[i].map_type);
- bool_func = (func_id == func_limit[i].func_id);
- /* only when map & func pair match it can continue.
- * don't allow any other map type to be passed into
- * the special func;
- */
- if (bool_func && bool_map != bool_func) {
- verbose("cannot pass map_type %d into func %d\n",
- map->map_type, func_id);
- return -EINVAL;
- }
+ /* We need a two way check, first is from map perspective ... */
+ switch (map->map_type) {
+ case BPF_MAP_TYPE_PROG_ARRAY:
+ if (func_id != BPF_FUNC_tail_call)
+ goto error;
+ break;
+ case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
+ if (func_id != BPF_FUNC_perf_event_read &&
+ func_id != BPF_FUNC_perf_event_output)
+ goto error;
+ break;
+ case BPF_MAP_TYPE_STACK_TRACE:
+ if (func_id != BPF_FUNC_get_stackid)
+ goto error;
+ break;
+ default:
+ break;
+ }
+
+ /* ... and second from the function itself. */
+ switch (func_id) {
+ case BPF_FUNC_tail_call:
+ if (map->map_type != BPF_MAP_TYPE_PROG_ARRAY)
+ goto error;
+ break;
+ case BPF_FUNC_perf_event_read:
+ case BPF_FUNC_perf_event_output:
+ if (map->map_type != BPF_MAP_TYPE_PERF_EVENT_ARRAY)
+ goto error;
+ break;
+ case BPF_FUNC_get_stackid:
+ if (map->map_type != BPF_MAP_TYPE_STACK_TRACE)
+ goto error;
+ break;
+ default:
+ break;
}
return 0;
+error:
+ verbose("cannot pass map_type %d into func %d\n",
+ map->map_type, func_id);
+ return -EINVAL;
}
static int check_call(struct verifier_env *env, int func_id)
--
2.8.0
^ permalink raw reply related
* [PATCH net 3/3] samples/bpf: fix trace_output example
From: Alexei Starovoitov @ 2016-04-28 1:56 UTC (permalink / raw)
To: David S . Miller
Cc: Daniel Borkmann, Jann Horn, Linus Torvalds, netdev, kernel-team
In-Reply-To: <1461808582-1452466-1-git-send-email-ast@fb.com>
llvm cannot always recognize memset as builtin function and optimize
it away, so just delete it. It was a leftover from testing
of bpf_perf_event_output() with large data structures.
Fixes: 39111695b1b8 ("samples: bpf: add bpf_perf_event_output example")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
samples/bpf/trace_output_kern.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/samples/bpf/trace_output_kern.c b/samples/bpf/trace_output_kern.c
index 8d8d1ec429eb..9b96f4fb8cea 100644
--- a/samples/bpf/trace_output_kern.c
+++ b/samples/bpf/trace_output_kern.c
@@ -18,7 +18,6 @@ int bpf_prog1(struct pt_regs *ctx)
u64 cookie;
} data;
- memset(&data, 0, sizeof(data));
data.pid = bpf_get_current_pid_tgid();
data.cookie = 0x12345678;
--
2.8.0
^ permalink raw reply related
* [PATCH net 1/3] bpf: fix refcnt overflow
From: Alexei Starovoitov @ 2016-04-28 1:56 UTC (permalink / raw)
To: David S . Miller
Cc: Daniel Borkmann, Jann Horn, Linus Torvalds, netdev, kernel-team
In-Reply-To: <1461808582-1452466-1-git-send-email-ast@fb.com>
On a system with >32Gbyte of phyiscal memory and infinite RLIMIT_MEMLOCK,
the malicious application may overflow 32-bit bpf program refcnt.
It's also possible to overflow map refcnt on 1Tb system.
Impose 32k hard limit which means that the same bpf program or
map cannot be shared by more than 32k processes.
Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
include/linux/bpf.h | 3 ++-
kernel/bpf/inode.c | 7 ++++---
kernel/bpf/syscall.c | 24 ++++++++++++++++++++----
kernel/bpf/verifier.c | 11 +++++++----
4 files changed, 33 insertions(+), 12 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 21ee41b92e8a..f1d5c5acc8dd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -171,12 +171,13 @@ void bpf_register_prog_type(struct bpf_prog_type_list *tl);
void bpf_register_map_type(struct bpf_map_type_list *tl);
struct bpf_prog *bpf_prog_get(u32 ufd);
+struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog);
void bpf_prog_put(struct bpf_prog *prog);
void bpf_prog_put_rcu(struct bpf_prog *prog);
struct bpf_map *bpf_map_get_with_uref(u32 ufd);
struct bpf_map *__bpf_map_get(struct fd f);
-void bpf_map_inc(struct bpf_map *map, bool uref);
+struct bpf_map *bpf_map_inc(struct bpf_map *map, bool uref);
void bpf_map_put_with_uref(struct bpf_map *map);
void bpf_map_put(struct bpf_map *map);
int bpf_map_precharge_memlock(u32 pages);
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index f2ece3c174a5..8f94ca1860cf 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -31,10 +31,10 @@ static void *bpf_any_get(void *raw, enum bpf_type type)
{
switch (type) {
case BPF_TYPE_PROG:
- atomic_inc(&((struct bpf_prog *)raw)->aux->refcnt);
+ raw = bpf_prog_inc(raw);
break;
case BPF_TYPE_MAP:
- bpf_map_inc(raw, true);
+ raw = bpf_map_inc(raw, true);
break;
default:
WARN_ON_ONCE(1);
@@ -297,7 +297,8 @@ static void *bpf_obj_do_get(const struct filename *pathname,
goto out;
raw = bpf_any_get(inode->i_private, *type);
- touch_atime(&path);
+ if (!IS_ERR(raw))
+ touch_atime(&path);
path_put(&path);
return raw;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index adc5e4bd74f8..cf5e9f7ad13a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -218,11 +218,18 @@ struct bpf_map *__bpf_map_get(struct fd f)
return f.file->private_data;
}
-void bpf_map_inc(struct bpf_map *map, bool uref)
+/* prog's and map's refcnt limit */
+#define BPF_MAX_REFCNT 32768
+
+struct bpf_map *bpf_map_inc(struct bpf_map *map, bool uref)
{
- atomic_inc(&map->refcnt);
+ if (atomic_inc_return(&map->refcnt) > BPF_MAX_REFCNT) {
+ atomic_dec(&map->refcnt);
+ return ERR_PTR(-EBUSY);
+ }
if (uref)
atomic_inc(&map->usercnt);
+ return map;
}
struct bpf_map *bpf_map_get_with_uref(u32 ufd)
@@ -234,7 +241,7 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
if (IS_ERR(map))
return map;
- bpf_map_inc(map, true);
+ map = bpf_map_inc(map, true);
fdput(f);
return map;
@@ -658,6 +665,15 @@ static struct bpf_prog *__bpf_prog_get(struct fd f)
return f.file->private_data;
}
+struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
+{
+ if (atomic_inc_return(&prog->aux->refcnt) > BPF_MAX_REFCNT) {
+ atomic_dec(&prog->aux->refcnt);
+ return ERR_PTR(-EBUSY);
+ }
+ return prog;
+}
+
/* called by sockets/tracing/seccomp before attaching program to an event
* pairs with bpf_prog_put()
*/
@@ -670,7 +686,7 @@ struct bpf_prog *bpf_prog_get(u32 ufd)
if (IS_ERR(prog))
return prog;
- atomic_inc(&prog->aux->refcnt);
+ prog = bpf_prog_inc(prog);
fdput(f);
return prog;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index db2574e7b8b0..89bcaa0966da 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2049,15 +2049,18 @@ static int replace_map_fd_with_map_ptr(struct verifier_env *env)
return -E2BIG;
}
- /* remember this map */
- env->used_maps[env->used_map_cnt++] = map;
-
/* hold the map. If the program is rejected by verifier,
* the map will be released by release_maps() or it
* will be used by the valid program until it's unloaded
* and all maps are released in free_bpf_prog_info()
*/
- bpf_map_inc(map, false);
+ map = bpf_map_inc(map, false);
+ if (IS_ERR(map)) {
+ fdput(f);
+ return PTR_ERR(map);
+ }
+ env->used_maps[env->used_map_cnt++] = map;
+
fdput(f);
next_insn:
insn++;
--
2.8.0
^ permalink raw reply related
* [PATCH net 0/3] bpf: fix several bugs
From: Alexei Starovoitov @ 2016-04-28 1:56 UTC (permalink / raw)
To: David S . Miller
Cc: Daniel Borkmann, Jann Horn, Linus Torvalds, netdev, kernel-team
First two patches address bugs found by Jann Horn.
Last patch is a minor samples fix spotted during the testing.
Alexei Starovoitov (3):
bpf: fix refcnt overflow
bpf: fix check_map_func_compatibility logic
samples/bpf: fix trace_output example
include/linux/bpf.h | 3 +-
kernel/bpf/inode.c | 7 ++--
kernel/bpf/syscall.c | 24 ++++++++++---
kernel/bpf/verifier.c | 76 +++++++++++++++++++++++++----------------
samples/bpf/trace_output_kern.c | 1 -
5 files changed, 73 insertions(+), 38 deletions(-)
--
2.8.0
^ permalink raw reply
* [PATCH net v3 5/5] drivers: net: cpsw: use of_phy_connect() in fixed-link case
From: David Rivshin (Allworx) @ 2016-04-28 1:45 UTC (permalink / raw)
To: netdev, linux-omap
Cc: linux-arm-kernel, devicetree, linux-kernel, David Miller,
Mugunthan V N, Grygorii Strashko, Andrew Goodbody, Markus Brunner,
Nicolas Chauvet
In-Reply-To: <1461805808-4102-1-git-send-email-drivshin.allworx@gmail.com>
From: David Rivshin <drivshin@allworx.com>
If a fixed-link DT subnode is used, the phy_device was looked up so
that a PHY ID string could be constructed and passed to phy_connect().
This is not necessary, as the device_node can be passed directly to
of_phy_connect() instead. This reuses the same codepath as if the
phy-handle DT property was used.
Signed-off-by: David Rivshin <drivshin@allworx.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Andrew Goodbody <andrew.goodbody@cambrionix.com>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
Changes since v2 [1]:
- Added Tested-by from Andrew Goodbody [3]
- Added Reviewed-by from Mugunthan V N [4]
- Added Reviewed-by from Grygorii Strashko [5]
Changes since v1 [2]:
- Rebased (trivial conflict, e5a03bfd modified the deleted snprintf)
- Added Tested-by from Nicolas Chauvet
[1] http://patchwork.ozlabs.org/patch/613276/
[2] http://patchwork.ozlabs.org/patch/560327/
[3] https://lkml.org/lkml/2016/4/22/537
[4] https://lkml.org/lkml/2016/4/22/63
[5] https://lkml.org/lkml/2016/4/22/529
drivers/net/ethernet/ti/cpsw.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 712bc6d..e2fcdf1 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2044,30 +2044,21 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
"phy-handle", 0);
parp = of_get_property(slave_node, "phy_id", &lenp);
if (slave_data->phy_node) {
dev_dbg(&pdev->dev,
"slave[%d] using phy-handle=\"%s\"\n",
i, slave_data->phy_node->full_name);
} else if (of_phy_is_fixed_link(slave_node)) {
- struct device_node *phy_node;
- struct phy_device *phy_dev;
-
/* In the case of a fixed PHY, the DT node associated
* to the PHY is the Ethernet MAC DT node.
*/
ret = of_phy_register_fixed_link(slave_node);
if (ret)
return ret;
- phy_node = of_node_get(slave_node);
- phy_dev = of_phy_find_device(phy_node);
- if (!phy_dev)
- return -ENODEV;
- snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
- PHY_ID_FMT, phy_dev->mdio.bus->id,
- phy_dev->mdio.addr);
+ slave_data->phy_node = of_node_get(slave_node);
} else if (parp) {
u32 phyid;
struct device_node *mdio_node;
struct platform_device *mdio;
if (lenp != (sizeof(__be32) * 2)) {
dev_err(&pdev->dev, "Invalid slave[%d] phy_id property\n", i);
--
2.5.5
^ permalink raw reply related
* Re: [PATCH v2 net-next 11/13] Documentation: Bindings: Update DT binding for separating dsaf dev support
From: Yisen Zhuang @ 2016-04-28 1:45 UTC (permalink / raw)
To: Rob Herring
Cc: Mark Rutland, devicetree@vger.kernel.org, salil.mehta,
Catalin Marinas, xieqianqian, Pawel Moll, Ian Campbell, netdev,
lipeng321, Will Deacon, Linuxarm, Wei Xu, huangdaode, Kumar Gala,
yankejian, David Miller, linux-arm-kernel@lists.infradead.org,
liguozhu
In-Reply-To: <CAL_JsqLHrHWtiikSt7F39LoExoL2aep5Pqax5cGWWBw-yhz79g@mail.gmail.com>
Hi Rob,
Thanks for you comments.
在 2016/4/27 23:25, Rob Herring 写道:
> On Tue, Apr 26, 2016 at 10:33 PM, Yisen Zhuang <Yisen.zhuang@huawei.com> wrote:
>> Hi Rob and David,
>>
>> Please see my comments inline.
>>
>> David have merged this series to net-next, but we need to modify some codes according
>> to Rob's comments. I am not sure if i need to send V3 for this series, or separate
>> patches of documentation to independent series and generate a new patch for hns base
>> on current net-next?
>
> That's David's call. I'm guessing he wants follow-up patches on top of these.
Okay, I will send a new series base on current net-next.
>
>> 在 2016/4/26 20:48, Rob Herring 写道:
>>> On Sat, Apr 23, 2016 at 05:05:15PM +0800, Yisen Zhuang wrote:
>>>> Because debug dsaf port was separated from service dsaf port, this patch
>>>> updates the related information of DT binding.
>>>
>>> Separated when? New version of the h/w? If so, where's the new
>>> compatible string? This is quite a big binding change.
>>
>> There isn't any change of h/w. I separated debug dsaf port from sevice dsaf
>> port to make the code more simple and readability.
>
> Okay.
>
> [...]
>
>>>> + serdes-syscon rather than this address.
>>>> The third region is the PPE register base and size.
>>>> - The fourth region is dsa fabric base register and size.
>>>> - The fifth region is cpld base register and size, it is not required if do not use cpld.
>>>> -- phy-handle: phy handle of physicl port, 0 if not any phy device. see ethernet.txt [1].
>>>> + The fourth region is dsa fabric base register and size. It is not required for
>>>> + single-port mode.
>>>> +- reg-names: may be ppe-base and(or) dsaf-base. It is used to find the
>>>> + corresponding reg's index.
>>>
>>> But you have up to 5 regions.
>>>
>>> The variable nature of what regions you have tells me you need more
>>> specific compatible strings for each chip.
>>
>> we didn't add support of new h/w. We added these regions to make code simple and readability.
>> If we need to add support of next h/w version next time, we don't need to add many branches
>> for these attributes. So we didn't add a new compatible here.
>
> Not sure what you mean by branches. It's fine to put properties for
> things that vary among h/w versions, but new compatible strings will
> be needed for any new versions.
I mean than we put properties for things that vary among h/w versions. If we add support for
new h/w versions next time, we will add new compatible strings.
>
>
>>>> +- port: subnodes of dsaf. A dsaf node may contain several port nodes(Depending
>>>> + on mode of dsaf). Port node contain some attributes listed below:
>>>> +- port-id: is physical port index in one dsaf.
>>>
>>> Indexes should generally be avoided. What does the number correspond
>>> to in h/w (if anything)?
>>
>> port-id is index for a port in dsaf, it is correspond to index of PHY showed below.
>
> Okay, you should use reg property here instead.
Agree, thanks.
>
>>
>> CPU
>> |
>> -----------------------------------
>> | | |
>> ---------------------------------------------- --------- ---------
>> | | | | | | | |
>> | PPE | | PPE | | PPE |
>> | | | | | | | | |
>> | | | | | | | | |
>> | crossbar | | | | | | |
>> | | | | | | | | |
>> | ---------------------------------- | | | | | | |
>> | | | | | | | | | | | | | |
>> | | | | | | | | | | | | | |
>> | MAC MAC MAC MAC MAC MAC | | MAC | | MAC |
>> | | | | | | | | | | | | | |
>> ---------------------------------------------- --------- ---------
>> | | | | | | \ / | / |
>> PHY PHY PHY PHY PHY PHY \ / PHY / PHY
>> \ / /
>> \ / /
>> DSAF(three platform device)
>>
>>>
>>>> +- phy-handle: phy handle of physicl port. It is not required if there isn't
>
> Another typo here.
Agree, thanks.
>
> Rob
>
> .
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* [PATCH net v3 4/5] dt: cpsw: phy-handle, phy_id, and fixed-link are mutually exclusive
From: David Rivshin (Allworx) @ 2016-04-28 1:42 UTC (permalink / raw)
To: netdev, linux-omap
Cc: linux-arm-kernel, devicetree, linux-kernel, David Miller,
Mugunthan V N, Grygorii Strashko, Andrew Goodbody, Markus Brunner,
Nicolas Chauvet
In-Reply-To: <1461805808-4102-1-git-send-email-drivshin.allworx@gmail.com>
From: David Rivshin <drivshin@allworx.com>
The phy-handle, phy_id, and fixed-link properties are mutually exclusive,
and only one need be specified. Make this clear in the binding doc.
Also mark the phy_id property as deprecated, as phy-handle should be
used instead.
Signed-off-by: David Rivshin <drivshin@allworx.com>
---
Changes since v2 [1]:
- split from previous patch 2
- marked the phy_id property as deprecated [3]
- removed Rob Herring's Acked-by due to above change
Changes since v1 [2]:
- Rebased (no conflicts)
- Added Tested-by from Nicolas Chauvet
- Added Acked-by from Rob Herring for the binding change
[1] http://patchwork.ozlabs.org/patch/613260/
[2] http://patchwork.ozlabs.org/patch/560324/
[3] https://lkml.org/lkml/2016/4/22/494
Documentation/devicetree/bindings/net/cpsw.txt | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/Documentation/devicetree/bindings/net/cpsw.txt b/Documentation/devicetree/bindings/net/cpsw.txt
index 28a4781..0ae0649 100644
--- a/Documentation/devicetree/bindings/net/cpsw.txt
+++ b/Documentation/devicetree/bindings/net/cpsw.txt
@@ -41,21 +41,21 @@ Optional properties:
Slave Properties:
Required properties:
- phy-mode : See ethernet.txt file in the same directory
Optional properties:
- dual_emac_res_vlan : Specifies VID to be used to segregate the ports
- mac-address : See ethernet.txt file in the same directory
-- phy_id : Specifies slave phy id
+- phy_id : Specifies slave phy id (deprecated, use phy-handle)
- phy-handle : See ethernet.txt file in the same directory
Slave sub-nodes:
- fixed-link : See fixed-link.txt file in the same directory
- Either the property phy_id, or the sub-node
- fixed-link can be specified
+
+Note: Exactly one of phy_id, phy-handle, or fixed-link must be specified.
Note: "ti,hwmods" field is used to fetch the base address and irq
resources from TI, omap hwmod data base during device registration.
Future plan is to migrate hwmod data base contents into device tree
blob so that, all the required data will be used from device tree dts
file.
--
2.5.5
^ permalink raw reply related
* [PATCH net v3 3/5] drivers: net: cpsw: don't ignore phy-mode if phy-handle is used
From: David Rivshin (Allworx) @ 2016-04-28 1:38 UTC (permalink / raw)
To: netdev, linux-omap
Cc: linux-arm-kernel, devicetree, linux-kernel, David Miller,
Mugunthan V N, Grygorii Strashko, Andrew Goodbody, Markus Brunner,
Nicolas Chauvet
In-Reply-To: <1461805808-4102-1-git-send-email-drivshin.allworx@gmail.com>
From: David Rivshin <drivshin@allworx.com>
The phy-mode emac property was only being processed in the phy_id
or fixed-link cases. However if phy-handle was specified instead,
an error message would complain about the lack of phy_id or
fixed-link, and then jump past the of_get_phy_mode(). This would
result in the PHY mode defaulting to MII, regardless of what the
devicetree specified.
Fixes: 9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
Signed-off-by: David Rivshin <drivshin@allworx.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Andrew Goodbody <andrew.goodbody@cambrionix.com>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
---
I would suggest this for -stable. It should apply cleanly as far back
as 4.4.
Changes since v2 [1]:
- split from previous patch 2
- Added Tested-by from Andrew Goodbody [3]
- Added Reviewed-by from Mugunthan V N [4]
- rewrote commit log to focus on the functional bug fixed, rather
than the bogus error message
Changes since v1 [2]:
- Rebased (no conflicts)
- Added Tested-by from Nicolas Chauvet
- Added Acked-by from Rob Herring for the binding change
[1] http://patchwork.ozlabs.org/patch/613260/
[2] http://patchwork.ozlabs.org/patch/560324/
[3] https://lkml.org/lkml/2016/4/22/537
[4] https://lkml.org/lkml/2016/4/22/63
drivers/net/ethernet/ti/cpsw.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 5903448..712bc6d 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2039,15 +2039,19 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
/* This is no slave child node, continue */
if (strcmp(slave_node->name, "slave"))
continue;
slave_data->phy_node = of_parse_phandle(slave_node,
"phy-handle", 0);
parp = of_get_property(slave_node, "phy_id", &lenp);
- if (of_phy_is_fixed_link(slave_node)) {
+ if (slave_data->phy_node) {
+ dev_dbg(&pdev->dev,
+ "slave[%d] using phy-handle=\"%s\"\n",
+ i, slave_data->phy_node->full_name);
+ } else if (of_phy_is_fixed_link(slave_node)) {
struct device_node *phy_node;
struct phy_device *phy_dev;
/* In the case of a fixed PHY, the DT node associated
* to the PHY is the Ethernet MAC DT node.
*/
ret = of_phy_register_fixed_link(slave_node);
@@ -2076,15 +2080,17 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
if (!mdio) {
dev_err(&pdev->dev, "Missing mdio platform device\n");
return -EINVAL;
}
snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
PHY_ID_FMT, mdio->name, phyid);
} else {
- dev_err(&pdev->dev, "No slave[%d] phy_id or fixed-link property\n", i);
+ dev_err(&pdev->dev,
+ "No slave[%d] phy_id, phy-handle, or fixed-link property\n",
+ i);
goto no_phy_slave;
}
slave_data->phy_if = of_get_phy_mode(slave_node);
if (slave_data->phy_if < 0) {
dev_err(&pdev->dev, "Missing or malformed slave[%d] phy-mode property\n",
i);
return slave_data->phy_if;
--
2.5.5
^ permalink raw reply related
* [PATCH net v3 2/5] drivers: net: cpsw: fix segfault in case of bad phy-handle
From: David Rivshin (Allworx) @ 2016-04-28 1:32 UTC (permalink / raw)
To: netdev, linux-omap
Cc: linux-arm-kernel, devicetree, linux-kernel, David Miller,
Mugunthan V N, Grygorii Strashko, Andrew Goodbody, Markus Brunner,
Nicolas Chauvet
In-Reply-To: <1461805808-4102-1-git-send-email-drivshin.allworx@gmail.com>
From: David Rivshin <drivshin@allworx.com>
If an emac node has a phy-handle property that points to something
which is not a phy, then a segmentation fault will occur when the
interface is brought up. This is because while phy_connect() will
return ERR_PTR() on failure, of_phy_connect() will return NULL.
The common error check uses IS_ERR(), and so missed when
of_phy_connect() fails. The NULL pointer is then dereferenced.
Also, the common error message referenced slave->data->phy_id,
which would be empty in the case of phy-handle. Instead, use the
name of the device_node as a useful identifier. And in the phy_id
case add the error code for completeness.
Fixes: 9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
Signed-off-by: David Rivshin <drivshin@allworx.com>
---
I would suggest this for -stable. It should apply cleanly as far back
as 4.5, although there is a trivial conflict in 4.4. I can produce a
separate patch against linux-4.4.y if preferred.
Changes since v2:
- new patch, although fixing part of previous patch 2 [1]
Changes since v1 [2]:
- Rebased (no conflicts)
- Added Tested-by from Nicolas Chauvet
- Added Acked-by from Rob Herring for the binding change
[1] http://patchwork.ozlabs.org/patch/613260/
[2] http://patchwork.ozlabs.org/patch/560324/
drivers/net/ethernet/ti/cpsw.c | 37 +++++++++++++++++++++++--------------
1 file changed, 23 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index ce0b0ca..5903448 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1143,33 +1143,42 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
if (priv->data.dual_emac)
cpsw_add_dual_emac_def_ale_entries(priv, slave, slave_port);
else
cpsw_ale_add_mcast(priv->ale, priv->ndev->broadcast,
1 << slave_port, 0, 0, ALE_MCAST_FWD_2);
- if (slave->data->phy_node)
+ if (slave->data->phy_node) {
slave->phy = of_phy_connect(priv->ndev, slave->data->phy_node,
&cpsw_adjust_link, 0, slave->data->phy_if);
- else
+ if (!slave->phy) {
+ dev_err(priv->dev, "phy \"%s\" not found on slave %d\n",
+ slave->data->phy_node->full_name,
+ slave->slave_num);
+ return;
+ }
+ } else {
slave->phy = phy_connect(priv->ndev, slave->data->phy_id,
&cpsw_adjust_link, slave->data->phy_if);
- if (IS_ERR(slave->phy)) {
- dev_err(priv->dev, "phy %s not found on slave %d\n",
- slave->data->phy_id, slave->slave_num);
- slave->phy = NULL;
- } else {
- phy_attached_info(slave->phy);
-
- phy_start(slave->phy);
-
- /* Configure GMII_SEL register */
- cpsw_phy_sel(&priv->pdev->dev, slave->phy->interface,
- slave->slave_num);
+ if (IS_ERR(slave->phy)) {
+ dev_err(priv->dev,
+ "phy \"%s\" not found on slave %d, err %ld\n",
+ slave->data->phy_id, slave->slave_num,
+ PTR_ERR(slave->phy));
+ slave->phy = NULL;
+ return;
+ }
}
+
+ phy_attached_info(slave->phy);
+
+ phy_start(slave->phy);
+
+ /* Configure GMII_SEL register */
+ cpsw_phy_sel(&priv->pdev->dev, slave->phy->interface, slave->slave_num);
}
static inline void cpsw_add_default_vlan(struct cpsw_priv *priv)
{
const int vlan = priv->data.default_vlan;
const int port = priv->host_port;
u32 reg;
--
2.5.5
^ permalink raw reply related
* [PATCH net v3 1/5] drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config
From: David Rivshin (Allworx) @ 2016-04-28 1:25 UTC (permalink / raw)
To: netdev, linux-omap
Cc: linux-arm-kernel, devicetree, linux-kernel, David Miller,
Mugunthan V N, Grygorii Strashko, Andrew Goodbody, Markus Brunner,
Nicolas Chauvet
In-Reply-To: <1461805808-4102-1-git-send-email-drivshin.allworx@gmail.com>
From: David Rivshin <drivshin@allworx.com>
Commit 9e42f715264ff158478fa30eaed847f6e131366b ("drivers: net: cpsw: add
phy-handle parsing") saved the "phy-handle" phandle into a new cpsw_priv
field. However, phy connections are per-slave, so the phy_node field should
be in cpsw_slave_data rather than cpsw_priv.
This would go unnoticed in a single emac configuration. But in dual_emac
mode, the last "phy-handle" property parsed for either slave would be used
by both of them, causing them both to refer to the same phy_device.
Fixes: 9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
Signed-off-by: David Rivshin <drivshin@allworx.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Andrew Goodbody <andrew.goodbody@cambrionix.com>
Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
I would suggest this for -stable. It should apply cleanly as far back
as 4.4.
Changes since v2 [1]:
- Added Tested-by from Andrew Goodbody [3]
- Added Reviewed-by from Mugunthan V N [4]
- Added Reviewed-by from Grygorii Strashko [5]
Changes since v1 [2]:
- Rebased (no conflicts)
- Added Tested-by from Nicolas Chauvet
[1] http://patchwork.ozlabs.org/patch/613237/
[2] http://patchwork.ozlabs.org/patch/560326/
[3] https://lkml.org/lkml/2016/4/22/537
[4] https://lkml.org/lkml/2016/4/22/63
[5] https://lkml.org/lkml/2016/4/22/496
drivers/net/ethernet/ti/cpsw.c | 13 ++++++-------
drivers/net/ethernet/ti/cpsw.h | 1 +
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index bbb77cd..ce0b0ca 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -363,15 +363,14 @@ static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset)
__raw_writel(val, slave->regs + offset);
}
struct cpsw_priv {
spinlock_t lock;
struct platform_device *pdev;
struct net_device *ndev;
- struct device_node *phy_node;
struct napi_struct napi_rx;
struct napi_struct napi_tx;
struct device *dev;
struct cpsw_platform_data data;
struct cpsw_ss_regs __iomem *regs;
struct cpsw_wr_regs __iomem *wr_regs;
u8 __iomem *hw_stats;
@@ -1144,16 +1143,16 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
if (priv->data.dual_emac)
cpsw_add_dual_emac_def_ale_entries(priv, slave, slave_port);
else
cpsw_ale_add_mcast(priv->ale, priv->ndev->broadcast,
1 << slave_port, 0, 0, ALE_MCAST_FWD_2);
- if (priv->phy_node)
- slave->phy = of_phy_connect(priv->ndev, priv->phy_node,
+ if (slave->data->phy_node)
+ slave->phy = of_phy_connect(priv->ndev, slave->data->phy_node,
&cpsw_adjust_link, 0, slave->data->phy_if);
else
slave->phy = phy_connect(priv->ndev, slave->data->phy_id,
&cpsw_adjust_link, slave->data->phy_if);
if (IS_ERR(slave->phy)) {
dev_err(priv->dev, "phy %s not found on slave %d\n",
slave->data->phy_id, slave->slave_num);
@@ -1936,20 +1935,19 @@ static void cpsw_slave_init(struct cpsw_slave *slave, struct cpsw_priv *priv,
slave->data = data;
slave->regs = regs + slave_reg_ofs;
slave->sliver = regs + sliver_reg_ofs;
slave->port_vlan = data->dual_emac_res_vlan;
}
-static int cpsw_probe_dt(struct cpsw_priv *priv,
+static int cpsw_probe_dt(struct cpsw_platform_data *data,
struct platform_device *pdev)
{
struct device_node *node = pdev->dev.of_node;
struct device_node *slave_node;
- struct cpsw_platform_data *data = &priv->data;
int i = 0, ret;
u32 prop;
if (!node)
return -EINVAL;
if (of_property_read_u32(node, "slaves", &prop)) {
@@ -2029,15 +2027,16 @@ static int cpsw_probe_dt(struct cpsw_priv *priv,
int lenp;
const __be32 *parp;
/* This is no slave child node, continue */
if (strcmp(slave_node->name, "slave"))
continue;
- priv->phy_node = of_parse_phandle(slave_node, "phy-handle", 0);
+ slave_data->phy_node = of_parse_phandle(slave_node,
+ "phy-handle", 0);
parp = of_get_property(slave_node, "phy_id", &lenp);
if (of_phy_is_fixed_link(slave_node)) {
struct device_node *phy_node;
struct phy_device *phy_dev;
/* In the case of a fixed PHY, the DT node associated
* to the PHY is the Ethernet MAC DT node.
@@ -2271,15 +2270,15 @@ static int cpsw_probe(struct platform_device *pdev)
* This may be required here for child devices.
*/
pm_runtime_enable(&pdev->dev);
/* Select default pin state */
pinctrl_pm_select_default_state(&pdev->dev);
- if (cpsw_probe_dt(priv, pdev)) {
+ if (cpsw_probe_dt(&priv->data, pdev)) {
dev_err(&pdev->dev, "cpsw: platform data missing\n");
ret = -ENODEV;
goto clean_runtime_disable_ret;
}
data = &priv->data;
if (is_valid_ether_addr(data->slave_data[0].mac_addr)) {
diff --git a/drivers/net/ethernet/ti/cpsw.h b/drivers/net/ethernet/ti/cpsw.h
index 442a703..e50afd1 100644
--- a/drivers/net/ethernet/ti/cpsw.h
+++ b/drivers/net/ethernet/ti/cpsw.h
@@ -14,14 +14,15 @@
#ifndef __CPSW_H__
#define __CPSW_H__
#include <linux/if_ether.h>
#include <linux/phy.h>
struct cpsw_slave_data {
+ struct device_node *phy_node;
char phy_id[MII_BUS_ID_SIZE];
int phy_if;
u8 mac_addr[ETH_ALEN];
u16 dual_emac_res_vlan; /* Reserved VLAN for DualEMAC */
};
struct cpsw_platform_data {
--
2.5.5
^ permalink raw reply related
* [PATCH net v3 0/5] drivers: net: cpsw: phy-handle fixes
From: David Rivshin (Allworx) @ 2016-04-28 1:10 UTC (permalink / raw)
To: netdev, linux-omap
Cc: linux-arm-kernel, devicetree, linux-kernel, David Miller,
Mugunthan V N, Grygorii Strashko, Andrew Goodbody, Markus Brunner,
Nicolas Chauvet
From: David Rivshin <drivshin@allworx.com>
This series fixes a number of related issues around using phy-handle
properties in cpsw emac nodes.
Patch 1 fixes a bug if more than one slave is used, and either
slave uses the phy-handle property in the devicetree.
Patch 2 fixes a NULL pointer dereference which can occur if a
phy-handle property is used and of_phy_connect() return NULL,
such as with a bad devicetree.
Patch 3 fixes an issue where the phy-mode property would be ignored
if a phy-handle property was used. This also fixes a bogus error
message that would be emitted.
Patch 4 fixes makes the binding documentation more explicit that
exactly one PHY property should be used, and also marks phy_id as
deprecated.
Patch 5 cleans up the fixed-link case to work like the now-fixed
phy-handle case.
I have tested on the following hardware configurations:
- (EVMSK) dual emac, phy_id property in both slaves
- (EVMSK) dual emac, phy-handle property in both slaves
- (EVMSK) a bad phy-handle property pointing to &mmc1
- (EVMSK) phy_id property with incorrect PHY address
- (BeagleBoneBlack) single emac, phy_id property
- (custom) single emac, fixed-link subnode
Andrew Goodbody reported testing v2 on a board that doesn't use
dual_emac mode, but with 2 PHYs using phy-handle properties [1].
Nicolas Chauvet reported testing v2 on an HP t410 (dm8148).
Markus Brunner reported testing v1 on the following [2]:
- emac0 with phy_id and emac1 with fixed phy
- emac0 with phy-handle and emac1 with fixed phy
- emac0 with fixed phy and emac1 with fixed phy
[1] https://lkml.org/lkml/2016/4/22/537
[2] http://www.spinics.net/lists/netdev/msg357890.html
David Rivshin (5):
drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac
config
drivers: net: cpsw: fix segfault in case of bad phy-handle
drivers: net: cpsw: don't ignore phy-mode if phy-handle is used
dt: cpsw: phy-handle, phy_id, and fixed-link are mutually exclusive
drivers: net: cpsw: use of_phy_connect() in fixed-link case
Documentation/devicetree/bindings/net/cpsw.txt | 6 +--
drivers/net/ethernet/ti/cpsw.c | 69 ++++++++++++++------------
drivers/net/ethernet/ti/cpsw.h | 1 +
3 files changed, 41 insertions(+), 35 deletions(-)
--
2.5.5
^ permalink raw reply
* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
From: Ben Greear @ 2016-04-28 0:14 UTC (permalink / raw)
To: Hannes Frederic Sowa, Ben Hutchings, linux-kernel, stable
Cc: akpm, David S. Miller, Vijay Pandurangan, Cong Wang, netdev,
Evan Jones, Nicolas Dichtel, Phil Sutter, Toshiaki Makita,
xiyou.wangcong
In-Reply-To: <1461801603.3971874.591751457.2DB91B98@webmail.messagingengine.com>
On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
> Hi Ben,
>
> On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
>> On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
>>> On 04/26/2016 04:02 PM, Ben Hutchings wrote:
>>>>
>>>> 3.2.80-rc1 review patch. If anyone has any objections, please let me know.
>>> I would be careful about this. It causes regressions when sending
>>> PACKET_SOCKET buffers from user-space to veth devices.
>>>
>>> There was a proposed upstream fix for the regression, but it has not gone
>>> into the tree as far as I know.
>>>
>>> http://www.spinics.net/lists/netdev/msg370436.html
>> [...]
>>
>> OK, I'll drop this for now.
>
> The fall out from not having this patch is in my opinion a bigger
> fallout than not having this patch. This patch fixes silent data
> corruption vs. the problem Ben Greear is talking about, which might not
> be that a common usage.
>
> What do others think?
>
> Bye,
> Hannes
>
This patch from Cong Wang seems to fix the regression for me, I think it should be added and
tested in the main tree, and then apply them to stable as a pair.
http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index da1ae0e..f8cc758 100644 (file)
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1926,6 +1926,7 @@ retry:
goto out_unlock;
}
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->protocol = proto;
skb->dev = dev;
skb->priority = sk->sk_priority;
@@ -2352,6 +2353,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
ph.raw = frame;
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->protocol = proto;
skb->dev = dev;
skb->priority = po->sk.sk_priority;
@@ -2776,6 +2778,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
goto out_free;
}
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->protocol = proto;
skb->dev = dev;
skb->priority = sk->sk_priority;
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox