Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] isdn/gigaset: Convert timers to use timer_setup()
From: David Miller @ 2017-10-05 16:53 UTC (permalink / raw)
  To: pebolle
  Cc: keescook, isdn, johan, gigaset307x-common, netdev, tglx,
	linux-kernel
In-Reply-To: <1507190336.2167.5.camel@tiscali.nl>

From: Paul Bolle <pebolle@tiscali.nl>
Date: Thu, 05 Oct 2017 09:58:56 +0200

> Hi Kees,
> 
> On Wed, 2017-10-04 at 17:52 -0700, Kees Cook wrote:
>> Also uses kzmalloc to replace open-coded field assignments to NULL and zero.
> 
> If I'm allowed to whine (chances that I'm allowed to do that are not so great
> as Dave tends to apply gigaset patches before I even have a chance to look at
> them properly!): I'd prefer it if that was done separately in a preceding
> patch. Would that bother you? 

Agreed, these timer transformations are already exhausting to review without
unrelated modifications sneaking in.

^ permalink raw reply

* [PATCH] ipv6: gso: fix payload length when gso_size is zero
From: Alexey Kodanev @ 2017-10-05 17:06 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Alexander Duyck, David Miller, Alexey Kodanev

When gso_size reset to zero for the tail segment in skb_segment(), later
in ipv6_gso_segment(), we will get incorrect payload_len for that segment.
inet_gso_segment() already has a check for gso_size before calculating
payload so fixing only IPv6 part.

The issue was found with LTP vxlan & gre tests over ixgbe NIC.

Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---
 net/ipv6/ip6_offload.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index cdb3728..4a87f94 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -105,7 +105,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 
 	for (skb = segs; skb; skb = skb->next) {
 		ipv6h = (struct ipv6hdr *)(skb_mac_header(skb) + nhoff);
-		if (gso_partial)
+		if (gso_partial && skb_is_gso(skb))
 			payload_len = skb_shinfo(skb)->gso_size +
 				      SKB_GSO_CB(skb)->data_offset +
 				      skb->head - (unsigned char *)(ipv6h + 1);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH] ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real
From: Matteo Croce @ 2017-10-05 17:03 UTC (permalink / raw)
  To: netdev, Erik Kline

Commit 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
was intended to affect accept_dad flag handling in such a way that
DAD operation and mode on a given interface would be selected
according to the maximum value of conf/{all,interface}/accept_dad.

However, addrconf_dad_begin() checks for particular cases in which we
need to skip DAD, and this check was modified in the wrong way.

Namely, it was modified so that, if the accept_dad flag is 0 for the
given interface *or* for all interfaces, DAD would be skipped.

We have instead to skip DAD if accept_dad is 0 for the given interface
*and* for all interfaces.

Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
---
 net/ipv6/addrconf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 96861c702c06..4a96ebbf8eda 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3820,8 +3820,8 @@ static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
 		goto out;

 	if (dev->flags&(IFF_NOARP|IFF_LOOPBACK) ||
-	    dev_net(dev)->ipv6.devconf_all->accept_dad < 1 ||
-	    idev->cnf.accept_dad < 1 ||
+	    (dev_net(dev)->ipv6.devconf_all->accept_dad < 1 &&
+	     idev->cnf.accept_dad < 1) ||
 	    !(ifp->flags&IFA_F_TENTATIVE) ||
 	    ifp->flags & IFA_F_NODAD) {
 		bump_id = ifp->flags & IFA_F_TENTATIVE;
-- 
2.13.6

^ permalink raw reply related

* Re: [PATCH] net/mac80211/mesh_plink: Convert timers to use
From: Kees Cook @ 2017-10-05 17:27 UTC (permalink / raw)
  To: Johannes Berg
  Cc: LKML, David S. Miller, linux-wireless, Network Development,
	Thomas Gleixner
In-Reply-To: <1507186052.2387.11.camel@sipsolutions.net>

On Wed, Oct 4, 2017 at 11:47 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Wed, 2017-10-04 at 17:49 -0700, Kees Cook wrote:
>> In preparation for unconditionally passing the struct timer_list
>> pointer to all timer callbacks, switch to using the new timer_setup()
>> and from_timer() to pass the timer pointer explicitly. This requires
>> adding a pointer back to the sta_info since container_of() can't
>> resolve the sta_info.
>
> The subject seems to be lacking something ... :-)

Oh wonderful, all the subjects are cut off. *sigh* I wonder which
piece of my workflow broke that...

>> This requires commit 686fef928bba ("timer: Prepare to change timer
>> callback argument type") in v4.14-rc3, but should be otherwise
>> stand-alone.
>
> I still can't apply that because that's not in net-next right now.

Okay, I'll see if Dave brings that into net-next and resend after that.

>>  static inline void mesh_plink_timer_set(struct sta_info *sta, u32
>> timeout)
>>  {
>>       sta->mesh->plink_timer.expires = jiffies +
>> msecs_to_jiffies(timeout);
>> -     sta->mesh->plink_timer.data = (unsigned long) sta;
>> -     sta->mesh->plink_timer.function = mesh_plink_timer;
>> +     sta->mesh->plink_sta = sta;
>> +     sta->mesh->plink_timer.function =
>> (TIMER_FUNC_TYPE)mesh_plink_timer;
>>       sta->mesh->plink_timeout = timeout;
>>       add_timer(&sta->mesh->plink_timer);
>
> Wouldn't it be better to convert this to timer_setup() now?

The problem is that plink_timer is used in several places, and it's
originally initialized in net/mac80211/sta_info.c. The call to
mesh_plink_timer_set() does an update of the function field, so it
didn't look like it could get merged with the timer_setup(), but in
looking again, it seems that this is the _only_ update to
plink_timer.function, so it could probably get collapsed into the
timer_setup() call.

> That add_timer() should probably also be mod_timer() anyway?

Agreed. I'd avoided making those changes in most places, but I can do it here.

>> diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
>> index 69615016d5bf..5e5de9455e4e 100644
>> --- a/net/mac80211/sta_info.c
>> +++ b/net/mac80211/sta_info.c
>> @@ -332,7 +332,7 @@ struct sta_info *sta_info_alloc(struct
>> ieee80211_sub_if_data *sdata,
>>               spin_lock_init(&sta->mesh->plink_lock);
>>               if (ieee80211_vif_is_mesh(&sdata->vif) &&
>>                   !sdata->u.mesh.user_mpm)
>> -                     init_timer(&sta->mesh->plink_timer);
>> +                     timer_setup(&sta->mesh->plink_timer, NULL,
>> 0);
>>               sta->mesh->nonpeer_pm = NL80211_MESH_POWER_ACTIVE;
>>       }
>
> You just have to make mesh_plink_timer() non-static, put a prototype
> into mesh.h and then you can use the proper timer_setup() here with the
> function?
>
> Also, the sta->mesh->plink_sta assignment should be here I'd say, no
> point rewriting it all the time.

Sounds good. I'll get it cleaned up.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply

* [PATCH v2] net/mac80211/mesh_plink: Convert timers to use timer_setup()
From: Kees Cook @ 2017-10-05 17:39 UTC (permalink / raw)
  To: Johannes Berg
  Cc: David S. Miller, linux-wireless, netdev, Thomas Gleixner,
	linux-kernel

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. This requires adding a pointer back
to the sta_info since container_of() can't resolve the sta_info.

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
This requires commit 686fef928bba ("timer: Prepare to change timer
callback argument type") in v4.14-rc3, but should be otherwise
stand-alone.

v2:
- make mesh_plink_timer non-static and use it in timer_setup() call directly.
---
 net/mac80211/mesh.h       |  1 +
 net/mac80211/mesh_plink.c | 10 ++++------
 net/mac80211/sta_info.c   |  4 +++-
 net/mac80211/sta_info.h   |  2 ++
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/mac80211/mesh.h b/net/mac80211/mesh.h
index 7e5f271e3c30..465b7853edc0 100644
--- a/net/mac80211/mesh.h
+++ b/net/mac80211/mesh.h
@@ -275,6 +275,7 @@ void mesh_neighbour_update(struct ieee80211_sub_if_data *sdata,
 			   u8 *hw_addr, struct ieee802_11_elems *ie);
 bool mesh_peer_accepts_plinks(struct ieee802_11_elems *ie);
 u32 mesh_accept_plinks_update(struct ieee80211_sub_if_data *sdata);
+void mesh_plink_timer(struct timer_list *t);
 void mesh_plink_broken(struct sta_info *sta);
 u32 mesh_plink_deactivate(struct sta_info *sta);
 u32 mesh_plink_open(struct sta_info *sta);
diff --git a/net/mac80211/mesh_plink.c b/net/mac80211/mesh_plink.c
index f69c6c38ca43..e79adb4164f3 100644
--- a/net/mac80211/mesh_plink.c
+++ b/net/mac80211/mesh_plink.c
@@ -604,8 +604,9 @@ void mesh_neighbour_update(struct ieee80211_sub_if_data *sdata,
 	ieee80211_mbss_info_change_notify(sdata, changed);
 }
 
-static void mesh_plink_timer(unsigned long data)
+void mesh_plink_timer(struct timer_list *t)
 {
+	struct mesh_sta *mesh = from_timer(mesh, t, plink_timer);
 	struct sta_info *sta;
 	u16 reason = 0;
 	struct ieee80211_sub_if_data *sdata;
@@ -617,7 +618,7 @@ static void mesh_plink_timer(unsigned long data)
 	 * del_timer_sync() this timer after having made sure
 	 * it cannot be readded (by deleting the plink.)
 	 */
-	sta = (struct sta_info *) data;
+	sta = mesh->plink_sta;
 
 	if (sta->sdata->local->quiescing)
 		return;
@@ -697,11 +698,8 @@ static void mesh_plink_timer(unsigned long data)
 
 static inline void mesh_plink_timer_set(struct sta_info *sta, u32 timeout)
 {
-	sta->mesh->plink_timer.expires = jiffies + msecs_to_jiffies(timeout);
-	sta->mesh->plink_timer.data = (unsigned long) sta;
-	sta->mesh->plink_timer.function = mesh_plink_timer;
 	sta->mesh->plink_timeout = timeout;
-	add_timer(&sta->mesh->plink_timer);
+	mod_timer(&sta->mesh->plink_timer, jiffies + msecs_to_jiffies(timeout));
 }
 
 static bool llid_in_use(struct ieee80211_sub_if_data *sdata,
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 69615016d5bf..6c254a9d5f11 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -329,10 +329,12 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
 		sta->mesh = kzalloc(sizeof(*sta->mesh), gfp);
 		if (!sta->mesh)
 			goto free;
+		sta->mesh->plink_sta = sta;
 		spin_lock_init(&sta->mesh->plink_lock);
 		if (ieee80211_vif_is_mesh(&sdata->vif) &&
 		    !sdata->u.mesh.user_mpm)
-			init_timer(&sta->mesh->plink_timer);
+			timer_setup(&sta->mesh->plink_timer, mesh_plink_timer,
+				    0);
 		sta->mesh->nonpeer_pm = NL80211_MESH_POWER_ACTIVE;
 	}
 #endif
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 3acbdfa9f649..21d9760ce5c3 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -344,6 +344,7 @@ DECLARE_EWMA(mesh_fail_avg, 20, 8)
  * @plink_state: peer link state
  * @plink_timeout: timeout of peer link
  * @plink_timer: peer link watch timer
+ * @plink_sta: peer link watch timer's sta_info
  * @t_offset: timing offset relative to this host
  * @t_offset_setpoint: reference timing offset of this sta to be used when
  * 	calculating clockdrift
@@ -356,6 +357,7 @@ DECLARE_EWMA(mesh_fail_avg, 20, 8)
  */
 struct mesh_sta {
 	struct timer_list plink_timer;
+	struct sta_info *plink_sta;
 
 	s64 t_offset;
 	s64 t_offset_setpoint;
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* Re: [PATCH net-next v3 1/2] libbpf: parse maps sections of varying size
From: Jesper Dangaard Brouer @ 2017-10-05 17:52 UTC (permalink / raw)
  To: Craig Gallek
  Cc: Alexei Starovoitov, Daniel Borkmann, David S . Miller,
	Chonggang Li, netdev, brouer
In-Reply-To: <20171005144158.14860-2-kraigatgoog@gmail.com>


On Thu,  5 Oct 2017 10:41:57 -0400 Craig Gallek <kraigatgoog@gmail.com> wrote:

> From: Craig Gallek <kraig@google.com>
> 
> This library previously assumed a fixed-size map options structure.
> Any new options were ignored.  In order to allow the options structure
> to grow and to support parsing older programs, this patch updates
> the maps section parsing to handle varying sizes.
> 
> Object files with maps sections smaller than expected will have the new
> fields initialized to zero.  Object files which have larger than expected
> maps sections will be rejected unless all of the unrecognized data is zero.
> 
> This change still assumes that each map definition in the maps section
> is the same size.
> 
> Signed-off-by: Craig Gallek <kraig@google.com>
> ---
>  tools/lib/bpf/libbpf.c | 70 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 41 insertions(+), 29 deletions(-)

Thank you for working on this! :-)

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH 00/18] use ARRAY_SIZE macro
From: J. Bruce Fields @ 2017-10-05 17:57 UTC (permalink / raw)
  To: Jérémy Lefaure
  Cc: Greg KH, Tobin C. Harding, alsa-devel, nouveau, dri-devel,
	dm-devel, brcm80211-dev-list, devel, linux-scsi, linux-rdma,
	amd-gfx, Jason Gunthorpe, linux-acpi, linux-video,
	intel-wired-lan, linux-media, intel-gfx, ecryptfs, linux-nfs,
	linux-raid, openipmi-developer, intel-gvt-dev, devel,
	brcm80211-dev-list.pdl, netdev, linux-usb
In-Reply-To: <20171002213312.3f904290@blatinox-laptop.localdomain>

On Mon, Oct 02, 2017 at 09:33:12PM -0400, Jérémy Lefaure wrote:
> On Mon, 2 Oct 2017 15:22:24 -0400
> bfields@fieldses.org (J. Bruce Fields) wrote:
> 
> > Mainly I'd just like to know which you're asking for.  Do you want me to
> > apply this, or to ACK it so someone else can?  If it's sent as a series
> > I tend to assume the latter.
> > 
> > But in this case I'm assuming it's the former, so I'll pick up the nfsd
> > one....
> Could you to apply the NFSD patch ("nfsd: use ARRAY_SIZE") with the
> Reviewed-by: Jeff Layton <jlayton@redhat.com>) tag please ?
> 
> This patch is an individual patch and it should not have been sent in a
> series (sorry about that).

Applying for 4.15, thanks.--b.

^ permalink raw reply

* Re: [net-next V4 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
From: John Fastabend @ 2017-10-05 18:01 UTC (permalink / raw)
  To: Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: netdev, jakub.kicinski, Michael S. Tsirkin, pavel.odintsov,
	Jason Wang, mchan, peter.waskiewicz.jr, Daniel Borkmann,
	Andy Gospodarek
In-Reply-To: <20171004190201.5no5mrmkko43cvv2@ast-mbp>

On 10/04/2017 12:02 PM, Alexei Starovoitov wrote:
> On Wed, Oct 04, 2017 at 02:03:45PM +0200, Jesper Dangaard Brouer wrote:
>> The 'cpumap' is primary used as a backend map for XDP BPF helper
>> call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'.
>>
>> This patch implement the main part of the map.  It is not connected to
>> the XDP redirect system yet, and no SKB allocation are done yet.
>>
>> The main concern in this patch is to ensure the datapath can run
>> without any locking.  This adds complexity to the setup and tear-down
>> procedure, which assumptions are extra carefully documented in the
>> code comments.
>>
>> V2:
>>  - make sure array isn't larger than NR_CPUS
>>  - make sure CPUs added is a valid possible CPU
>>
>> V3: fix nitpicks from Jakub Kicinski <kubakici@wp.pl>
>>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ...
>> +static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
>> +{
>> +	struct bpf_cpu_map *cmap;
>> +	u64 cost;
>> +	int err;
>> +
>> +	/* check sanity of attributes */
>> +	if (attr->max_entries == 0 || attr->key_size != 4 ||
>> +	    attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	cmap = kzalloc(sizeof(*cmap), GFP_USER);
>> +	if (!cmap)
>> +		return ERR_PTR(-ENOMEM);
> 
> just noticed that there is nothing here nor in DEVMAP/SOCKMAP
> that prevents unpriv user to create them.
> I'm not sure it was intentional for DEVMAP/SOCKMAP.
> For CPUMAP I'd suggest to restrict it to root, since it
> suppose to operate with XDP only which is root anyway.
> Note, lpm and lru maps are cap_sys_admin only already.
> 

For DEVMAP I think the same argument applies. DEVMAP is supposed
to operate with XDP only which is CAP_NET_ADMIN restricted so
we should restrict DEVMAP as well.

In the SOCKMAP case although the map can be created programs
can not be attached. So I'll restrict it to CAP_NET_ADMIN as well
until someone has a clear use case for allowing it. I don't have
a use case for non CAP_NET_ADMIN usage and its easier to relax
restrictions later than add them.

I have a couple fixes for sockmap under test so I'll add these
patches as well. Should have the set ready shortly, in a few days.

Thanks,
John

^ permalink raw reply

* Re: [PATCH] mwifiex: Use put_unaligned_le32
From: Brian Norris @ 2017-10-05 18:02 UTC (permalink / raw)
  To: Himanshu Jha
  Cc: Kalle Valo, amitkarwar, nishants, gbhat, huxm, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <20171005152233.GA6250@himanshu-Vostro-3559>

On Thu, Oct 05, 2017 at 08:52:33PM +0530, Himanshu Jha wrote:
> There are various instances where a function used in file say for eg
> int func_align (void* a)
> is used and it is defined in align.h
> But many files don't *directly* include align.h and rather include
> any other header which includes align.h

I believe the general rule is that you should included headers for all
symbols you use, and not rely on implicit includes.

The modification to the general rule is that not all headers are
intended to be included directly, and in such cases there's likely a
parent header that is the more appropriate target.

In this case, the key is CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS. It
seems that asm-generic/unaligned.h is set up to include different
headers, based on the expected architecture behavior.

I wonder if include/linux/unaligned/access_ok.h should have a safety
check (e.g., raise an #error if
!CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS?).

> Is compiling the file the only way to check if apppropriate header is
> included or is there some other way to check for it.

I believe it's mostly manual. Implicit includes have been a problem for
anyone who refactors header files.

Brian

^ permalink raw reply

* Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: Levi Pearson @ 2017-10-05 18:09 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Vinicius Costa Gomes, Linux Kernel Network Developers,
	intel-wired-lan, Jamal Hadi Salim, Cong Wang, andre.guedes,
	Ivan Briano, jesus.sanchez-palencia, boon.leong.ong,
	richardcochran, Henrik Austad, Rodney Cummings
In-Reply-To: <20171004063650.GA1895@nanopsycho>

On Wed, Oct 4, 2017 at 12:36 AM, Jiri Pirko <jiri@resnulli.us> wrote:

>>+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = {
>>+      .next           =       NULL,
>>+      .id             =       "cbs",
>>+      .priv_size      =       sizeof(struct cbs_sched_data),
>>+      .enqueue        =       cbs_enqueue,
>>+      .dequeue        =       qdisc_dequeue_head,
>>+      .peek           =       qdisc_peek_dequeued,
>>+      .init           =       cbs_init,
>>+      .reset          =       qdisc_reset_queue,
>>+      .destroy        =       cbs_destroy,
>>+      .change         =       cbs_change,
>>+      .dump           =       cbs_dump,
>>+      .owner          =       THIS_MODULE,
>>+};
>
> I don't see a software implementation for this. Looks like you are
> trying abuse tc subsystem to bypass kernel. Could you please explain
> this? The golden rule is: implement in kernel, then offload.

It would be a shame if this were blocked due to a missing software
implementation. This module is analogous to (and designed to work
with) the mqprio module; it directly configures the 802.1Qav
(Forwarding and Queuing for Time-Sensitive Streams) functionality of
multi-queue NICs with that capability. I'm not sure what makes it seem
like an attempt to "bypass the kernel"; it's actually an attempt to
get an appropriate configuration path *into* the kernel, which has
been missing for some time.

While it would be valuable to have a CBS software-only implementation,
and Vinicius and colleagues have mentioned plans to implement one,
most users will have chosen Qav-compliant NICs and will prefer to use
the hardware capability. In fact they are often *already* using that
capability, but configure it via non-standardized interfaces in
out-of-tree or vendor-tree drivers. I believe it's valuable to have
the "knobs" fit in with the mqprio qdisc and the overall tc subsystem
rather than forcing users through various unrelated configuration
tools, but ultimately the hooks just need to be in the network
subsystem so the drivers can be told how the user wants to set the
registers.

It *might* be reasonable to add the functionality of this to mqprio
instead of a separate module, but this is only one of many possible
802.1Q shapers that could be selected and configured (with more being
defined by IEEE 802.1 working groups for different use cases), and it
seems cleaner to me to have their configuration be through separate
modules than crammed into an already-confusing one, especially since
mqprio has much broader applicability than CBS and it probably doesn't
make sense to burden all mqprio users with the configuration option
overhead.

This meets a specific need in industry (this is widely used in
automotive infotainment devices with broad hardware support across the
SoCs targeted at that industry) that is not well-served by a software
implementation of class-level shaping. As a maintainer of the OpenAvnu
project (sponsored by Avnu, an industry alliance formed around the TSN
standards), I will be integrating support for this as soon as it's
available to our traffic shaping management userspace tools, which
currently have to rely on out-of-tree drivers with custom interfaces
or the HTB shaper which can be configured close to CBS, but with
greatly increased overhead.

Levi

^ permalink raw reply

* Re: [PATCH] netfilter: ipset: Convert timers to use timer_setup()
From: Kees Cook @ 2017-10-05 18:15 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: LKML, Pablo Neira Ayuso, Florian Westphal, David S. Miller,
	Stephen Hemminger, simran singhal, Muhammad Falak R Wani,
	netfilter-devel, coreteam, Network Development, Thomas Gleixner
In-Reply-To: <alpine.DEB.2.11.1710051551460.11178@blackhole.kfki.hu>

On Thu, Oct 5, 2017 at 6:58 AM, Jozsef Kadlecsik
<kadlec@blackhole.kfki.hu> wrote:
> Hi,
>
> On Wed, 4 Oct 2017, Kees Cook wrote:
>
>> In preparation for unconditionally passing the struct timer_list pointer
>> to all timer callbacks, switch to using the new timer_setup() and
>> from_timer() to pass the timer pointer explicitly. This introduces a
>> pointer back to the struct ip_set, which is used instead of the struct
>> timer_list .data field.
>
> Please add the same changes to net/netfilter/ipset/ip_set_list.c too, in
> order to handle all ipset modules in a single patch. I don't see a way
> either to avoid the introduction of the new pointer.

Ah yes, thanks. I'll send a v2 with that included.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply

* [PATCH v2] netfilter: ipset: Convert timers to use timer_setup()
From: Kees Cook @ 2017-10-05 18:21 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Pablo Neira Ayuso, Florian Westphal, David S. Miller,
	Stephen Hemminger, simran singhal, Muhammad Falak R Wani,
	Thomas Gleixner, netfilter-devel, coreteam, netdev, linux-kernel

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. This introduces a pointer back to the
struct ip_set, which is used instead of the struct timer_list .data field.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Florian Westphal <fw@strlen.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: simran singhal <singhalsimran0@gmail.com>
Cc: Muhammad Falak R Wani <falakreyaz@gmail.com>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Cc: netdev@vger.kernel.org
Cc: Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
---
This requires commit 686fef928bba ("timer: Prepare to change timer
callback argument type") in v4.14-rc3, but should be otherwise
stand-alone.

v2:
- include ip_set_list_set.c in the conversion.
---
 net/netfilter/ipset/ip_set_bitmap_gen.h   | 10 +++++-----
 net/netfilter/ipset/ip_set_bitmap_ip.c    |  2 ++
 net/netfilter/ipset/ip_set_bitmap_ipmac.c |  2 ++
 net/netfilter/ipset/ip_set_bitmap_port.c  |  2 ++
 net/netfilter/ipset/ip_set_hash_gen.h     | 12 +++++++-----
 net/netfilter/ipset/ip_set_list_set.c     | 12 +++++++-----
 6 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h b/net/netfilter/ipset/ip_set_bitmap_gen.h
index 8ad2b52a0b32..5ca18f07683b 100644
--- a/net/netfilter/ipset/ip_set_bitmap_gen.h
+++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
@@ -37,11 +37,11 @@
 #define get_ext(set, map, id)	((map)->extensions + ((set)->dsize * (id)))
 
 static void
-mtype_gc_init(struct ip_set *set, void (*gc)(unsigned long ul_set))
+mtype_gc_init(struct ip_set *set, void (*gc)(struct timer_list *t))
 {
 	struct mtype *map = set->data;
 
-	setup_timer(&map->gc, gc, (unsigned long)set);
+	timer_setup(&map->gc, gc, 0);
 	mod_timer(&map->gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ);
 }
 
@@ -272,10 +272,10 @@ mtype_list(const struct ip_set *set,
 }
 
 static void
-mtype_gc(unsigned long ul_set)
+mtype_gc(struct timer_list *t)
 {
-	struct ip_set *set = (struct ip_set *)ul_set;
-	struct mtype *map = set->data;
+	struct mtype *map = from_timer(map, t, gc);
+	struct ip_set *set = map->set;
 	void *x;
 	u32 id;
 
diff --git a/net/netfilter/ipset/ip_set_bitmap_ip.c b/net/netfilter/ipset/ip_set_bitmap_ip.c
index 4783efff0bde..d8975a0b4282 100644
--- a/net/netfilter/ipset/ip_set_bitmap_ip.c
+++ b/net/netfilter/ipset/ip_set_bitmap_ip.c
@@ -48,6 +48,7 @@ struct bitmap_ip {
 	size_t memsize;		/* members size */
 	u8 netmask;		/* subnet netmask */
 	struct timer_list gc;	/* garbage collection */
+	struct ip_set *set;	/* attached to this ip_set */
 	unsigned char extensions[0]	/* data extensions */
 		__aligned(__alignof__(u64));
 };
@@ -232,6 +233,7 @@ init_map_ip(struct ip_set *set, struct bitmap_ip *map,
 	map->netmask = netmask;
 	set->timeout = IPSET_NO_TIMEOUT;
 
+	map->set = set;
 	set->data = map;
 	set->family = NFPROTO_IPV4;
 
diff --git a/net/netfilter/ipset/ip_set_bitmap_ipmac.c b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
index 9a065f672d3a..4c279fbd2d5d 100644
--- a/net/netfilter/ipset/ip_set_bitmap_ipmac.c
+++ b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
@@ -52,6 +52,7 @@ struct bitmap_ipmac {
 	u32 elements;		/* number of max elements in the set */
 	size_t memsize;		/* members size */
 	struct timer_list gc;	/* garbage collector */
+	struct ip_set *set;	/* attached to this ip_set */
 	unsigned char extensions[0]	/* MAC + data extensions */
 		__aligned(__alignof__(u64));
 };
@@ -307,6 +308,7 @@ init_map_ipmac(struct ip_set *set, struct bitmap_ipmac *map,
 	map->elements = elements;
 	set->timeout = IPSET_NO_TIMEOUT;
 
+	map->set = set;
 	set->data = map;
 	set->family = NFPROTO_IPV4;
 
diff --git a/net/netfilter/ipset/ip_set_bitmap_port.c b/net/netfilter/ipset/ip_set_bitmap_port.c
index 7f0c733358a4..7f9bbd7c98b5 100644
--- a/net/netfilter/ipset/ip_set_bitmap_port.c
+++ b/net/netfilter/ipset/ip_set_bitmap_port.c
@@ -40,6 +40,7 @@ struct bitmap_port {
 	u32 elements;		/* number of max elements in the set */
 	size_t memsize;		/* members size */
 	struct timer_list gc;	/* garbage collection */
+	struct ip_set *set;	/* attached to this ip_set */
 	unsigned char extensions[0]	/* data extensions */
 		__aligned(__alignof__(u64));
 };
@@ -214,6 +215,7 @@ init_map_port(struct ip_set *set, struct bitmap_port *map,
 	map->last_port = last_port;
 	set->timeout = IPSET_NO_TIMEOUT;
 
+	map->set = set;
 	set->data = map;
 	set->family = NFPROTO_UNSPEC;
 
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index 51063d9ed0f7..efffc8eabafe 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -280,6 +280,7 @@ htable_bits(u32 hashsize)
 struct htype {
 	struct htable __rcu *table; /* the hash table */
 	struct timer_list gc;	/* garbage collection when timeout enabled */
+	struct ip_set *set;	/* attached to this ip_set */
 	u32 maxelem;		/* max elements in the hash */
 	u32 initval;		/* random jhash init value */
 #ifdef IP_SET_HASH_WITH_MARKMASK
@@ -429,11 +430,11 @@ mtype_destroy(struct ip_set *set)
 }
 
 static void
-mtype_gc_init(struct ip_set *set, void (*gc)(unsigned long ul_set))
+mtype_gc_init(struct ip_set *set, void (*gc)(struct timer_list *t))
 {
 	struct htype *h = set->data;
 
-	setup_timer(&h->gc, gc, (unsigned long)set);
+	timer_setup(&h->gc, gc, 0);
 	mod_timer(&h->gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ);
 	pr_debug("gc initialized, run in every %u\n",
 		 IPSET_GC_PERIOD(set->timeout));
@@ -526,10 +527,10 @@ mtype_expire(struct ip_set *set, struct htype *h)
 }
 
 static void
-mtype_gc(unsigned long ul_set)
+mtype_gc(struct timer_list *t)
 {
-	struct ip_set *set = (struct ip_set *)ul_set;
-	struct htype *h = set->data;
+	struct htype *h = from_timer(h, t, gc);
+	struct ip_set *set = h->set;
 
 	pr_debug("called\n");
 	spin_lock_bh(&set->lock);
@@ -1314,6 +1315,7 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
 	t->htable_bits = hbits;
 	RCU_INIT_POINTER(h->table, t);
 
+	h->set = set;
 	set->data = h;
 #ifndef IP_SET_PROTO_UNDEF
 	if (set->family == NFPROTO_IPV4) {
diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c
index 178d4eba013b..c9b4e05ad940 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -44,6 +44,7 @@ struct set_adt_elem {
 struct list_set {
 	u32 size;		/* size of set list array */
 	struct timer_list gc;	/* garbage collection */
+	struct ip_set *set;	/* attached to this ip_set */
 	struct net *net;	/* namespace */
 	struct list_head members; /* the set members */
 };
@@ -571,10 +572,10 @@ static const struct ip_set_type_variant set_variant = {
 };
 
 static void
-list_set_gc(unsigned long ul_set)
+list_set_gc(struct timer_list *t)
 {
-	struct ip_set *set = (struct ip_set *)ul_set;
-	struct list_set *map = set->data;
+	struct list_set *map = from_timer(map, t, gc);
+	struct ip_set *set = map->set;
 
 	spin_lock_bh(&set->lock);
 	set_cleanup_entries(set);
@@ -585,11 +586,11 @@ list_set_gc(unsigned long ul_set)
 }
 
 static void
-list_set_gc_init(struct ip_set *set, void (*gc)(unsigned long ul_set))
+list_set_gc_init(struct ip_set *set, void (*gc)(struct timer_list *t))
 {
 	struct list_set *map = set->data;
 
-	setup_timer(&map->gc, gc, (unsigned long)set);
+	timer_setup(&map->gc, gc, 0);
 	mod_timer(&map->gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ);
 }
 
@@ -606,6 +607,7 @@ init_list_set(struct net *net, struct ip_set *set, u32 size)
 
 	map->size = size;
 	map->net = net;
+	map->set = set;
 	INIT_LIST_HEAD(&map->members);
 	set->data = map;
 
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* Re: [PATCH] ipv6: gso: fix payload length when gso_size is zero
From: Girish Moodalbail @ 2017-10-05 18:24 UTC (permalink / raw)
  To: Alexey Kodanev, netdev; +Cc: Steffen Klassert, Alexander Duyck, David Miller
In-Reply-To: <1507223207-17557-1-git-send-email-alexey.kodanev@oracle.com>

On 10/5/17 10:06 AM, Alexey Kodanev wrote:
> When gso_size reset to zero for the tail segment in skb_segment(), later
> in ipv6_gso_segment(), we will get incorrect payload_len for that segment.
> inet_gso_segment() already has a check for gso_size before calculating
> payload so fixing only IPv6 part.
> 
> The issue was found with LTP vxlan & gre tests over ixgbe NIC.
> 
> Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
>   net/ipv6/ip6_offload.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> index cdb3728..4a87f94 100644
> --- a/net/ipv6/ip6_offload.c
> +++ b/net/ipv6/ip6_offload.c
> @@ -105,7 +105,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
>   
>   	for (skb = segs; skb; skb = skb->next) {
>   		ipv6h = (struct ipv6hdr *)(skb_mac_header(skb) + nhoff);
> -		if (gso_partial)
> +		if (gso_partial && skb_is_gso(skb))
>   			payload_len = skb_shinfo(skb)->gso_size +
>   				      SKB_GSO_CB(skb)->data_offset +
>   				      skb->head - (unsigned char *)(ipv6h + 1);
> 

Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>

^ permalink raw reply

* Re: [PATCH net-next 4/4] selinux: bpf: Add addtional check for bpf object file receive
From: Stephen Smalley @ 2017-10-05 18:26 UTC (permalink / raw)
  To: Chenbo Feng, netdev, SELinux, linux-security-module
  Cc: Chenbo Feng, Alexei Starovoitov, Daniel Borkmann, Lorenzo Colitti
In-Reply-To: <1507210621.27146.7.camel@tycho.nsa.gov>

On Thu, 2017-10-05 at 09:37 -0400, Stephen Smalley wrote:
> On Wed, 2017-10-04 at 11:29 -0700, Chenbo Feng wrote:
> > From: Chenbo Feng <fengc@google.com>
> > 
> > Introduce a bpf object related check when sending and receiving
> > files
> > through unix domain socket as well as binder. It checks if the
> > receiving
> > process have privilege to read/write the bpf map or use the bpf
> > program.
> > This check is necessary because the bpf maps and programs are using
> > a
> > anonymous inode as their shared inode so the normal way of checking
> > the
> > files and sockets when passing between processes cannot work
> > properly
> > on
> > eBPF object. This check only works when the BPF_SYSCALL is
> > configured.
> > 
> > Signed-off-by: Chenbo Feng <fengc@google.com>
> > ---
> >  include/linux/bpf.h      |  3 +++
> >  kernel/bpf/syscall.c     |  4 ++--
> >  security/selinux/hooks.c | 57
> > +++++++++++++++++++++++++++++++++++++++++++++++-
> >  3 files changed, 61 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index d757ea3f2228..ac8428a36d56 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -250,6 +250,9 @@ int bpf_prog_test_run_skb(struct bpf_prog
> > *prog,
> > const union bpf_attr *kattr,
> >  #ifdef CONFIG_BPF_SYSCALL
> >  DECLARE_PER_CPU(int, bpf_prog_active);
> >  
> > +extern const struct file_operations bpf_map_fops;
> > +extern const struct file_operations bpf_prog_fops;
> > +
> >  #define BPF_PROG_TYPE(_id, _ops) \
> >  	extern const struct bpf_verifier_ops _ops;
> >  #define BPF_MAP_TYPE(_id, _ops) \
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 58ff769d58ab..5789a5359f0a 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -313,7 +313,7 @@ static ssize_t bpf_dummy_write(struct file
> > *filp,
> > const char __user *buf,
> >  	return -EINVAL;
> >  }
> >  
> > -static const struct file_operations bpf_map_fops = {
> > +const struct file_operations bpf_map_fops = {
> >  #ifdef CONFIG_PROC_FS
> >  	.show_fdinfo	= bpf_map_show_fdinfo,
> >  #endif
> > @@ -965,7 +965,7 @@ static void bpf_prog_show_fdinfo(struct
> > seq_file
> > *m, struct file *filp)
> >  }
> >  #endif
> >  
> > -static const struct file_operations bpf_prog_fops = {
> > +const struct file_operations bpf_prog_fops = {
> >  #ifdef CONFIG_PROC_FS
> >  	.show_fdinfo	= bpf_prog_show_fdinfo,
> >  #endif
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index 41aba4e3d57c..381474ce3216 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -1847,6 +1847,7 @@ static int file_has_perm(const struct cred
> > *cred,
> >  
> >  	/* av is zero if only checking access to the descriptor.
> > */
> >  	rc = 0;
> > +
> >  	if (av)
> >  		rc = inode_has_perm(cred, inode, av, &ad);
> >  
> > @@ -2142,6 +2143,10 @@ static int
> > selinux_binder_transfer_binder(struct task_struct *from,
> >  			    NULL);
> >  }
> >  
> > +#ifdef CONFIG_BPF_SYSCALL
> > +static int bpf_fd_pass(struct file *file, u32 sid);
> > +#endif
> > +
> >  static int selinux_binder_transfer_file(struct task_struct *from,
> >  					struct task_struct *to,
> >  					struct file *file)
> > @@ -2165,6 +2170,12 @@ static int
> > selinux_binder_transfer_file(struct
> > task_struct *from,
> >  			return rc;
> >  	}
> >  
> > +#ifdef CONFIG_BPF_SYSCALL
> > +	rc = bpf_fd_pass(file, sid);
> > +	if (rc)
> > +		return rc;
> > +#endif
> > +
> >  	if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
> >  		return 0;
> >  
> > @@ -3735,8 +3746,18 @@ static int
> > selinux_file_send_sigiotask(struct
> > task_struct *tsk,
> >  static int selinux_file_receive(struct file *file)
> >  {
> >  	const struct cred *cred = current_cred();
> > +	int rc;
> > +
> > +	rc = file_has_perm(cred, file, file_to_av(file));
> > +	if (rc)
> > +		goto out;
> > +
> > +#ifdef CONFIG_BPF_SYSCALL
> > +	rc = bpf_fd_pass(file, cred_sid(sid));
> > +#endif
> >  
> > -	return file_has_perm(cred, file, file_to_av(file));
> > +out:
> > +	return rc;
> >  }
> >  
> >  static int selinux_file_open(struct file *file, const struct cred
> > *cred)
> > @@ -6288,6 +6309,40 @@ static u32 bpf_map_fmode_to_av(fmode_t
> > fmode)
> >  	return av;
> >  }
> >  
> > +/* This function will check the file pass through unix socket or
> > binder to see
> > + * if it is a bpf related object. And apply correspinding checks
> > on
> > the bpf
> > + * object based on the type. The bpf maps and programs, not like
> > other files and
> > + * socket, are using a shared anonymous inode inside the kernel as
> > their inode.
> > + * So checking that inode cannot identify if the process have
> > privilege to
> > + * access the bpf object and that's why we have to add this
> > additional check in
> > + * selinux_file_receive and selinux_binder_transfer_files.
> > + */
> > +static int bpf_fd_pass(struct file *file, u32 sid)
> > +{
> > +	struct bpf_security_struct *bpfsec;
> > +	u32 sid = cred_sid(cred);
> > +	struct bpf_prog *prog;
> > +	struct bpf_map *map;
> > +	int ret;
> > +
> > +	if (file->f_op == &bpf_map_fops) {
> > +		map = file->private_data;
> > +		bpfsec = map->security;
> > +		ret = avc_has_perm(sid, bpfsec->sid,
> > SECCLASS_BPF_MAP,
> > +				   bpf_map_fmode_to_av(file-
> > > f_mode), NULL);
> > 
> > +		if (ret)
> > +			return ret;
> > +	} else if (file->f_op == &bpf_prog_fops) {
> > +		prog = file->private_data;
> > +		bpfsec = prog->aux->security;
> > +		ret = avc_has_perm(sid, bpfsec->sid,
> > SECCLASS_BPF_PROG,
> > +				   BPF_PROG__USE, NULL);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +	return 0;
> > +}
> 
> When the struct file is allocated for the bpf map and/or prog, you
> could call a hook at that time passing both, and note the fact that
> it
> is a bpf map/prog in the file_security_struct.  Then, on
> file_receive/binder_transfer_file, you could apply the appropriate
> checking.  Further, if we know that the file is always allocated at
> the
> same point as the bpf map/prog, then they should have the same SID
> (i.e
> fsec->sid should be the same as bpfsec->sid), so we shouldn't even
> need
> to dereference the bpf map/prog.  Unless I'm missing something.
> 
> Also, are we concerned about doing the same in
> flush_unauthorized_files(), for inheriting descriptors across a
> context-changing execve?  Should this checking actually go into
> file_has_perm() itself so it is always applied on any use of the
> struct
> file?
> 
> Lastly, do we need/want these checks if sid == bpfsec->sid?  We skip
> FD__USE in the case where sid == fsec->sid, for example.

BTW, the prog use check seems slightly redundant in that we will
already check fd use permission.  So we only really need it if you want
to allow fd use but deny prog use.  The map read/write checks are more
granular than fd use, so I guess we can't get rid of those.

> 
> > +
> >  static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
> >  {
> >  	u32 sid = current_sid();

^ permalink raw reply

* Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: David Miller @ 2017-10-05 18:29 UTC (permalink / raw)
  To: levipearson
  Cc: jiri, vinicius.gomes, netdev, intel-wired-lan, jhs,
	xiyou.wangcong, andre.guedes, ivan.briano, jesus.sanchez-palencia,
	boon.leong.ong, richardcochran, henrik, rodney.cummings
In-Reply-To: <CAEYbN3RjUXGMyxo0t88-ASNVEVQdfXkMzBbMtMHAhqWScOO=Cg@mail.gmail.com>

From: Levi Pearson <levipearson@gmail.com>
Date: Thu, 5 Oct 2017 12:09:32 -0600

> It would be a shame if this were blocked due to a missing software
> implementation.

Quite the contrary, I think a software implementation is a minimum
requirement for inclusion of this feature.

Without a software implementation, there is no clear definition of
what is supposed to happen, and no clear way for people to test those
expectations unless they have the specific hardware.

I completely agree with Jiri.  Hardware offload first is _not_ how
we do things in the Linux networking.

^ permalink raw reply

* Re: [PATCH net-next v2 0/3] ethtool: support for forward error correction mode setting on a link
From: Jakub Kicinski @ 2017-10-05 18:30 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: davem@davemloft.net, John W. Linville, netdev@vger.kernel.org,
	Vidya Sagar Ravipati, Dustin Byford, Dave Olson, Casey Leedom,
	Gal Pressman, Andrew Lunn, Manoj Malviya, Santosh Rastapur,
	yuval.mintz, odedw, Ariel Almog, Jeff Kirsher, Dirk van der Merwe
In-Reply-To: <CAJieiUgM=t0GhnRXP5YYAZytFoKYRutpM0ZiVsDMjrpLZwCHsQ@mail.gmail.com>

On Fri, 28 Jul 2017 23:28:26 -0700, Roopa Prabhu wrote:
> On Fri, Jul 28, 2017 at 9:46 AM, Jakub Kicinski <kubakici@wp.pl> wrote:
> > On Fri, 28 Jul 2017 07:53:01 -0700, Roopa Prabhu wrote:  
> >> On Thu, Jul 27, 2017 at 7:33 PM, Jakub Kicinski <kubakici@wp.pl> wrote:  
> >> > On Thu, 27 Jul 2017 16:47:25 -0700, Roopa Prabhu wrote:  
> >> >> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> >> >>
> >> >> Forward Error Correction (FEC) modes i.e Base-R
> >> >> and Reed-Solomon modes are introduced in 25G/40G/100G standards
> >> >> for providing good BER at high speeds. Various networking devices
> >> >> which support 25G/40G/100G provides ability to manage supported FEC
> >> >> modes and the lack of FEC encoding control and reporting today is a
> >> >> source for interoperability issues for many vendors.
> >> >> FEC capability as well as specific FEC mode i.e. Base-R
> >> >> or RS modes can be requested or advertised through bits D44:47 of base link
> >> >> codeword.
> >> >>
> >> >> This patch set intends to provide option under ethtool to manage and
> >> >> report FEC encoding settings for networking devices as per IEEE 802.3
> >> >> bj, bm and by specs.
> >> >>
> >> >> v2 :
> >> >>         - minor patch format fixes and typos pointed out by Andrew
> >> >>         - there was a pending discussion on the use of 'auto' vs
> >> >>           'automatic' for fec settings. I have left it as 'auto'
> >> >>           because in most cases today auto is used in place of
> >> >>           automatic to represent automatically generated values.
> >> >>           We use it in other networking config too. I would prefer
> >> >>           leaving it as auto.  
> >> >
> >> > On the subject of resetting the values when module is replugged I
> >> > assume what was previously described remains:
> >> >  - we always allow users to set the FEC regardless of the module type;
> >> >  - if user set an incorrect FEC for the module type (or module gets
> >> >    swapped) the link will be administratively taken down by either
> >> >    the driver or FW.
> >> >
> >> > Is that correct?  Am I misremembering?  
> >>
> >> yes, correct. And possible future sfp hotplug events can give user-space
> >> more info to react to module type changes etc.  
> >
> > OK, if nobody else objects and we go with that - lets make sure we
> > document clearly those are expected :)  My concern is that if there is
> > ever 10G + RS FEC standard we don't want to end up in a situation where
> > some drivers silently ignore FEC settings in 10G and other apply it.
> > So let's make it clear what the intended Linux behaviour is.  It could
> > be in the ethtool man page, or the kernel somewhere.  
> 
> sure :), ack. We will document it in the ethtool manpage.

Hi Roopa!  Did you ever publish the ethtool user space patches at all?
I can't find them...

^ permalink raw reply

* RE: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: Rodney Cummings @ 2017-10-05 18:41 UTC (permalink / raw)
  To: David Miller, levipearson@gmail.com
  Cc: jiri@resnulli.us, vinicius.gomes@intel.com,
	netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	jhs@mojatatu.com, xiyou.wangcong@gmail.com,
	andre.guedes@intel.com, ivan.briano@intel.com,
	jesus.sanchez-palencia@intel.com, boon.leong.ong@intel.com,
	richardcochran@gmail.com, henrik@austad.us
In-Reply-To: <20171005.112909.2052593524154643514.davem@davemloft.net>

The IEEE Std 802.1Q specs for credit-based shaper require precise transmit decisions
within a 125 microsecond window of time.

Even with the Preempt RT patch or similar enhancements, that isn't very practical
as software-only. I doubt that software would conform to the standard's
requirements.

This is analogous to memory, or CPU.
.

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, October 5, 2017 1:29 PM
> To: levipearson@gmail.com
> Cc: jiri@resnulli.us; vinicius.gomes@intel.com; netdev@vger.kernel.org;
> intel-wired-lan@lists.osuosl.org; jhs@mojatatu.com;
> xiyou.wangcong@gmail.com; andre.guedes@intel.com; ivan.briano@intel.com;
> jesus.sanchez-palencia@intel.com; boon.leong.ong@intel.com;
> richardcochran@gmail.com; henrik@austad.us; Rodney Cummings
> <rodney.cummings@ni.com>
> Subject: Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based
> Shaper (CBS) qdisc
> 
> From: Levi Pearson <levipearson@gmail.com>
> Date: Thu, 5 Oct 2017 12:09:32 -0600
> 
> > It would be a shame if this were blocked due to a missing software
> > implementation.
> 
> Quite the contrary, I think a software implementation is a minimum
> requirement for inclusion of this feature.
> 
> Without a software implementation, there is no clear definition of
> what is supposed to happen, and no clear way for people to test those
> expectations unless they have the specific hardware.
> 
> I completely agree with Jiri.  Hardware offload first is _not_ how
> we do things in the Linux networking.

^ permalink raw reply

* Re: [PATCH] ipv6: gso: fix payload length when gso_size is zero
From: Duyck, Alexander H @ 2017-10-05 18:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org, alexey.kodanev@oracle.com
  Cc: davem@davemloft.net, steffen.klassert@secunet.com
In-Reply-To: <1507223207-17557-1-git-send-email-alexey.kodanev@oracle.com>

On Thu, 2017-10-05 at 20:06 +0300, Alexey Kodanev wrote:
> When gso_size reset to zero for the tail segment in skb_segment(), later
> in ipv6_gso_segment(), we will get incorrect payload_len for that segment.
> inet_gso_segment() already has a check for gso_size before calculating
> payload so fixing only IPv6 part.
> 
> The issue was found with LTP vxlan & gre tests over ixgbe NIC.
> 
> Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
>  net/ipv6/ip6_offload.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> index cdb3728..4a87f94 100644
> --- a/net/ipv6/ip6_offload.c
> +++ b/net/ipv6/ip6_offload.c
> @@ -105,7 +105,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
>  
>  	for (skb = segs; skb; skb = skb->next) {
>  		ipv6h = (struct ipv6hdr *)(skb_mac_header(skb) + nhoff);
> -		if (gso_partial)
> +		if (gso_partial && skb_is_gso(skb))
>  			payload_len = skb_shinfo(skb)->gso_size +
>  				      SKB_GSO_CB(skb)->data_offset +
>  				      skb->head - (unsigned char *)(ipv6h + 1);

So looking over this change it looks good to me. I'm just wondering if
you have looked at the code in __skb_udp_tunnel_segment or
gre_gso_segment? It seems like if you needed this change here you
should need to make similar changes to those functions as well. I'm
wondering if we just aren't seeing issues due to the segments already
being MSS sized before being handed to us for segmentation.

- Alex

^ permalink raw reply

* Re: [PATCH v2 net-next 06/12] qed: Add LL2 slowpath handling
From: Kalderon, Michal @ 2017-10-05 18:59 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel
In-Reply-To: <CY1PR0701MB20128130D21FD3C54E45B5A188720-UpKza+2NMNLHMJvQ0dyT705OhdzP3rhOnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>

From: Kalderon, Michal
Sent: Tuesday, October 3, 2017 9:05 PM
To: David Miller
>From: David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>Sent: Tuesday, October 3, 2017 8:17 PM
>>> @@ -423,6 +423,41 @@ static void qed_ll2_rxq_parse_reg(struct qed_hwfn *p_hwfn,
>>>  }
>>>
>>>  static int
>>> +qed_ll2_handle_slowpath(struct qed_hwfn *p_hwfn,
>>> +                     struct qed_ll2_info *p_ll2_conn,
>>> +                     union core_rx_cqe_union *p_cqe,
>>> +                     unsigned long *p_lock_flags)
>>> +{
>>...
>>> +     spin_unlock_irqrestore(&p_rx->lock, *p_lock_flags);
>>> +
>>
>>You can't drop this lock.
>>
>>Another thread can enter the loop of our caller and process RX queue
>>entries, then we would return from here and try to process the same
>>entries again.
>
>The lock is there to synchronize access to chains between qed_ll2_rxq_completion
>and qed_ll2_post_rx_buffer. qed_ll2_rxq_completion can't be called from
>different threads, the light l2 uses the single sp status block we have.
>The reason we release the lock is to avoid a deadlock where as a result of calling
>upper-layer driver it will potentially post additional rx-buffers.

Dave, is there anything else needed from me on this? 
Noticed the series is still in "Changes Requested". 

thanks,
Michal


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: David Miller @ 2017-10-05 19:05 UTC (permalink / raw)
  To: rodney.cummings
  Cc: levipearson, jiri, vinicius.gomes, netdev, intel-wired-lan, jhs,
	xiyou.wangcong, andre.guedes, ivan.briano, jesus.sanchez-palencia,
	boon.leong.ong, richardcochran, henrik
In-Reply-To: <CY1PR0401MB1536A44D0AB459BB9618664A92700@CY1PR0401MB1536.namprd04.prod.outlook.com>

From: Rodney Cummings <rodney.cummings@ni.com>
Date: Thu, 5 Oct 2017 18:41:48 +0000

> The IEEE Std 802.1Q specs for credit-based shaper require precise transmit decisions
> within a 125 microsecond window of time.
> 
> Even with the Preempt RT patch or similar enhancements, that isn't very practical
> as software-only. I doubt that software would conform to the standard's
> requirements.
> 
> This is analogous to memory, or CPU.

I feel like this is looking for an excuse to not have to at least try to implement
the software version of CBS.

^ permalink raw reply

* Re: [PATCH v2 net-next 06/12] qed: Add LL2 slowpath handling
From: David Miller @ 2017-10-05 19:06 UTC (permalink / raw)
  To: Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA
In-Reply-To: <CY1PR0701MB2012A2F8E3E923D98B1E1A6488700-UpKza+2NMNLHMJvQ0dyT705OhdzP3rhOnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>

From: "Kalderon, Michal" <Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
Date: Thu, 5 Oct 2017 18:59:04 +0000

> From: Kalderon, Michal
> Sent: Tuesday, October 3, 2017 9:05 PM
> To: David Miller
>>From: David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>>Sent: Tuesday, October 3, 2017 8:17 PM
>>>> @@ -423,6 +423,41 @@ static void qed_ll2_rxq_parse_reg(struct qed_hwfn *p_hwfn,
>>>>  }
>>>>
>>>>  static int
>>>> +qed_ll2_handle_slowpath(struct qed_hwfn *p_hwfn,
>>>> +                     struct qed_ll2_info *p_ll2_conn,
>>>> +                     union core_rx_cqe_union *p_cqe,
>>>> +                     unsigned long *p_lock_flags)
>>>> +{
>>>...
>>>> +     spin_unlock_irqrestore(&p_rx->lock, *p_lock_flags);
>>>> +
>>>
>>>You can't drop this lock.
>>>
>>>Another thread can enter the loop of our caller and process RX queue
>>>entries, then we would return from here and try to process the same
>>>entries again.
>>
>>The lock is there to synchronize access to chains between qed_ll2_rxq_completion
>>and qed_ll2_post_rx_buffer. qed_ll2_rxq_completion can't be called from
>>different threads, the light l2 uses the single sp status block we have.
>>The reason we release the lock is to avoid a deadlock where as a result of calling
>>upper-layer driver it will potentially post additional rx-buffers.
> 
> Dave, is there anything else needed from me on this? 
> Noticed the series is still in "Changes Requested". 

I'm still not convinced that the lock dropping is legitimate.  What if a
spurious interrupt arrives?

If the execution path in the caller is serialized for some reason, why
are you using a spinlock and don't use that serialization for the mutual
exclusion necessary for these queue indexes?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V2] Fix a sleep-in-atomic bug in shash_setkey_unaligned
From: Marcelo Ricardo Leitner @ 2017-10-05 19:07 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, luto, baijiaju1990, nhorman, vyasevich, kvalo,
	linux-crypto, netdev, linux-sctp, linux-wireless
In-Reply-To: <20171005131631.GA1553@gondor.apana.org.au>

On Thu, Oct 05, 2017 at 09:16:31PM +0800, Herbert Xu wrote:
> On Thu, Oct 05, 2017 at 06:16:20PM +0800, Herbert Xu wrote:
> >
> > That was my point.  Functions like sctp_pack_cookie shouldn't be
> > setting the key in the first place.  The setkey should happen at
> > the point when the key is generated.  That's sctp_endpoint_init
> > which AFAICS only gets called in GFP_KERNEL context.
> > 
> > Or is there a code-path where sctp_endpoint_init is called in
> > softirq context?
> 
> OK, there are indeed code paths where the key is derived in softirq
> context.  Notably sctp_auth_calculate_hmac.
> 
> So I think this patch is the correct fix and I will push it upstream
> as well as back to stable.

Okay, thanks.

  Marcelo

^ permalink raw reply

* Re: [PATCH] mwifiex: Use put_unaligned_le32
From: Himanshu Jha @ 2017-10-05 19:07 UTC (permalink / raw)
  To: Brian Norris
  Cc: Kalle Valo, amitkarwar, nishants, gbhat, huxm, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <20171005180248.GA94139@google.com>

On Thu, Oct 05, 2017 at 11:02:50AM -0700, Brian Norris wrote:
> On Thu, Oct 05, 2017 at 08:52:33PM +0530, Himanshu Jha wrote:
> > There are various instances where a function used in file say for eg
> > int func_align (void* a)
> > is used and it is defined in align.h
> > But many files don't *directly* include align.h and rather include
> > any other header which includes align.h
> 
> I believe the general rule is that you should included headers for all
> symbols you use, and not rely on implicit includes.
> 
> The modification to the general rule is that not all headers are
> intended to be included directly, and in such cases there's likely a
> parent header that is the more appropriate target.
> 
> In this case, the key is CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS. It
> seems that asm-generic/unaligned.h is set up to include different
> headers, based on the expected architecture behavior.
>
Yes, asm-generic/unaligned.h looks more appopriate and is most generic
implementation of unaligned accesses and  arc specific.

Let's see what Kalle Valo recommends! And then I will send v2 of the
patch.

Thanks for the information!

Himanshu Jha

> I wonder if include/linux/unaligned/access_ok.h should have a safety
> check (e.g., raise an #error if
> !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS?).
> 
> > Is compiling the file the only way to check if apppropriate header is
> > included or is there some other way to check for it.
> 
> I believe it's mostly manual. Implicit includes have been a problem for
> anyone who refactors header files.
> 
> Brian

^ permalink raw reply

* RE: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: Rodney Cummings @ 2017-10-05 19:17 UTC (permalink / raw)
  To: David Miller
  Cc: levipearson@gmail.com, jiri@resnulli.us, vinicius.gomes@intel.com,
	netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	jhs@mojatatu.com, xiyou.wangcong@gmail.com,
	andre.guedes@intel.com, ivan.briano@intel.com,
	jesus.sanchez-palencia@intel.com, boon.leong.ong@intel.com,
	richardcochran@gmail.com, henrik@austad.us
In-Reply-To: <20171005.120508.2267452751875787466.davem@davemloft.net>

No excuse. If the software cannot meet the standard's requirements, it is non-conformant,
which means it cannot be called a standard credit-based shaper.

But... I have no objection if someone wants to try software-only. I'm just saying that it
is a waste of time for me.

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, October 5, 2017 2:05 PM
> To: Rodney Cummings <rodney.cummings@ni.com>
> Cc: levipearson@gmail.com; jiri@resnulli.us; vinicius.gomes@intel.com;
> netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org;
> jhs@mojatatu.com; xiyou.wangcong@gmail.com; andre.guedes@intel.com;
> ivan.briano@intel.com; jesus.sanchez-palencia@intel.com;
> boon.leong.ong@intel.com; richardcochran@gmail.com; henrik@austad.us
> Subject: Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based
> Shaper (CBS) qdisc
> 
> From: Rodney Cummings <rodney.cummings@ni.com>
> Date: Thu, 5 Oct 2017 18:41:48 +0000
> 
> > The IEEE Std 802.1Q specs for credit-based shaper require precise
> transmit decisions
> > within a 125 microsecond window of time.
> >
> > Even with the Preempt RT patch or similar enhancements, that isn't very
> practical
> > as software-only. I doubt that software would conform to the standard's
> > requirements.
> >
> > This is analogous to memory, or CPU.
> 
> I feel like this is looking for an excuse to not have to at least try to
> implement
> the software version of CBS.

^ permalink raw reply

* Re: [PATCH] isdn/gigaset: Convert timers to use timer_setup()
From: Kees Cook @ 2017-10-05 19:17 UTC (permalink / raw)
  To: Paul Bolle
  Cc: Karsten Keil, David S. Miller, Johan Hovold, gigaset307x-common,
	Network Development, Thomas Gleixner, LKML
In-Reply-To: <1507190336.2167.5.camel@tiscali.nl>

On Thu, Oct 5, 2017 at 12:58 AM, Paul Bolle <pebolle@tiscali.nl> wrote:
> Hi Kees,
>
> On Wed, 2017-10-04 at 17:52 -0700, Kees Cook wrote:
>> Also uses kzmalloc to replace open-coded field assignments to NULL and zero.
>
> If I'm allowed to whine (chances that I'm allowed to do that are not so great
> as Dave tends to apply gigaset patches before I even have a chance to look at
> them properly!): I'd prefer it if that was done separately in a preceding
> patch. Would that bother you?

Sure, that's fine, I'll split it and re-send.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox