* [PATCH net] net: sched: check netif_xmit_frozen_or_stopped() in sch_direct_xmit()
From: Song Liu @ 2018-05-25 18:11 UTC (permalink / raw)
To: netdev; +Cc: Song Liu, kernel-team, John Fastabend, David S . Miller
Summary:
At the end of sch_direct_xmit(), we are in the else path of
!dev_xmit_complete(ret), which means ret == NETDEV_TX_OK. The following
condition will always fail and netif_xmit_frozen_or_stopped() is not
checked at all.
if (ret && netif_xmit_frozen_or_stopped(txq))
return false;
In this patch, this condition is fixed as:
if (netif_xmit_frozen_or_stopped(txq))
return false;
and further simplifies the code as:
return !netif_xmit_frozen_or_stopped(txq);
Fixes: 29b86cdac00a ("net: sched: remove remaining uses for qdisc_qlen in xmit path")
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
net/sched/sch_generic.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 39c144b..8261d48 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -346,10 +346,7 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
return false;
}
- if (ret && netif_xmit_frozen_or_stopped(txq))
- return false;
-
- return true;
+ return !netif_xmit_frozen_or_stopped(txq);
}
/*
--
2.9.5
^ permalink raw reply related
* Re: [PATCH] ath6kl: mark expected switch fall-throughs
From: Kalle Valo @ 2018-05-25 18:10 UTC (permalink / raw)
To: Gustavo A. R. Silva
Cc: Sergei Shtylyov, David S. Miller, linux-wireless, netdev,
linux-kernel
In-Reply-To: <e777af5c-8d56-aef7-a4b6-f93f12378049@embeddedor.com>
"Gustavo A. R. Silva" <gustavo@embeddedor.com> writes:
> On 05/25/2018 08:30 AM, Kalle Valo wrote:
>> Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> writes:
>>
>>> On 5/25/2018 2:13 AM, Gustavo A. R. Silva wrote:
>>>
>>>> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
>>>> where we are expecting to fall through.
>>>>
>>>> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
>>>> ---
>>>> drivers/net/wireless/ath/ath6kl/cfg80211.c | 6 +++---
>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/net/wireless/ath/ath6kl/cfg80211.c b/drivers/net/wireless/ath/ath6kl/cfg80211.c
>>>> index 2ba8cf3..29e32cd 100644
>>>> --- a/drivers/net/wireless/ath/ath6kl/cfg80211.c
>>>> +++ b/drivers/net/wireless/ath/ath6kl/cfg80211.c
>>>> @@ -3898,17 +3898,17 @@ int ath6kl_cfg80211_init(struct ath6kl *ar)
>>>> wiphy->max_scan_ie_len = 1000; /* FIX: what is correct limit? */
>>>> switch (ar->hw.cap) {
>>>> case WMI_11AN_CAP:
>>>> - ht = true;
>>>> + ht = true; /* fall through */
>>>> case WMI_11A_CAP:
>>>> band_5gig = true;
>>>> break;
>>>> case WMI_11GN_CAP:
>>>> - ht = true;
>>>> + ht = true; /* fall through */
>>>> case WMI_11G_CAP:
>>>> band_2gig = true;
>>>> break;
>>>> case WMI_11AGN_CAP:
>>>> - ht = true;
>>>> + ht = true; /* fall through */
>>>> case WMI_11AG_CAP:
>>>> band_2gig = true;
>>>> band_5gig = true;
>>>
>>> Hm, typically such comments are done on a line of their own, have
>>> never seen this style...
>>
>> Yeah, I was wondering the same. Was there a particular reason for this?
>>
>
> Sometimes people use this style for a one-line code block.
>
> I can change it to the traditional style. No problem.
I would prefer that. So if you can send v2 that would be great.
--
Kalle Valo
^ permalink raw reply
* Re: [RFC PATCH net-next 00/12] XDP batching for TUN/vhost_net
From: Michael S. Tsirkin @ 2018-05-25 17:53 UTC (permalink / raw)
To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com>
On Mon, May 21, 2018 at 05:04:21PM +0800, Jason Wang wrote:
> Hi all:
>
> We do not support XDP batching for TUN since it can only receive one
> packet a time from vhost_net. This series tries to remove this
> limitation by:
>
> - introduce a TUN specific msg_control that can hold a pointer to an
> array of XDP buffs
> - try copy and build XDP buff in vhost_net
> - store XDP buffs in an array and submit them once for every N packets
> from vhost_net
> - since TUN can only do native XDP for datacopy packet, to simplify
> the logic, split datacopy out logic and only do batching for
> datacopy.
I like how this rework looks. Pls go ahead and repost as
non-RFC.
> With this series, TX PPS can improve about 34% from 2.9Mpps to
> 3.9Mpps when doing xdp_redirect_map between TAP and ixgbe.
>
> Thanks
>
> Jason Wang (12):
> vhost_net: introduce helper to initialize tx iov iter
> vhost_net: introduce vhost_exceeds_weight()
> vhost_net: introduce vhost_has_more_pkts()
> vhost_net: split out datacopy logic
> vhost_net: batch update used ring for datacopy TX
> tuntap: enable premmption early
> tuntap: simplify error handling in tun_build_skb()
> tuntap: tweak on the path of non-xdp case in tun_build_skb()
> tuntap: split out XDP logic
> vhost_net: build xdp buff
> vhost_net: passing raw xdp buff to tun
> vhost_net: batch submitting XDP buffers to underlayer sockets
>
> drivers/net/tun.c | 226 +++++++++++++++++++++++++++----------
> drivers/vhost/net.c | 297 ++++++++++++++++++++++++++++++++++++++++++++-----
> include/linux/if_tun.h | 7 ++
> 3 files changed, 444 insertions(+), 86 deletions(-)
>
> --
> 2.7.4
^ permalink raw reply
* Re: [PATCH] ath6kl: mark expected switch fall-throughs
From: Gustavo A. R. Silva @ 2018-05-25 17:50 UTC (permalink / raw)
To: Kalle Valo, Sergei Shtylyov
Cc: David S. Miller, linux-wireless, netdev, linux-kernel
In-Reply-To: <871sdzc16l.fsf@kamboji.qca.qualcomm.com>
On 05/25/2018 08:30 AM, Kalle Valo wrote:
> Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> writes:
>
>> On 5/25/2018 2:13 AM, Gustavo A. R. Silva wrote:
>>
>>> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
>>> where we are expecting to fall through.
>>>
>>> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
>>> ---
>>> drivers/net/wireless/ath/ath6kl/cfg80211.c | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/wireless/ath/ath6kl/cfg80211.c b/drivers/net/wireless/ath/ath6kl/cfg80211.c
>>> index 2ba8cf3..29e32cd 100644
>>> --- a/drivers/net/wireless/ath/ath6kl/cfg80211.c
>>> +++ b/drivers/net/wireless/ath/ath6kl/cfg80211.c
>>> @@ -3898,17 +3898,17 @@ int ath6kl_cfg80211_init(struct ath6kl *ar)
>>> wiphy->max_scan_ie_len = 1000; /* FIX: what is correct limit? */
>>> switch (ar->hw.cap) {
>>> case WMI_11AN_CAP:
>>> - ht = true;
>>> + ht = true; /* fall through */
>>> case WMI_11A_CAP:
>>> band_5gig = true;
>>> break;
>>> case WMI_11GN_CAP:
>>> - ht = true;
>>> + ht = true; /* fall through */
>>> case WMI_11G_CAP:
>>> band_2gig = true;
>>> break;
>>> case WMI_11AGN_CAP:
>>> - ht = true;
>>> + ht = true; /* fall through */
>>> case WMI_11AG_CAP:
>>> band_2gig = true;
>>> band_5gig = true;
>>
>> Hm, typically such comments are done on a line of their own, have
>> never seen this style...
>
> Yeah, I was wondering the same. Was there a particular reason for this?
>
Sometimes people use this style for a one-line code block.
I can change it to the traditional style. No problem.
Thanks
--
Gustavo
^ permalink raw reply
* [PATCH] atm: zatm: fix memcmp casting
From: Ivan Bornyakov @ 2018-05-25 17:49 UTC (permalink / raw)
To: 3chas3; +Cc: linux-atm-general, netdev, linux-kernel, Ivan Bornyakov
memcmp() returns int, but eprom_try_esi() cast it to unsigned char. One
can lose significant bits and get 0 from non-0 value returned by the
memcmp().
Signed-off-by: Ivan Bornyakov <brnkv.i1@gmail.com>
---
drivers/atm/zatm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/atm/zatm.c b/drivers/atm/zatm.c
index 9c9a22958717..a8d2eb0ceb8d 100644
--- a/drivers/atm/zatm.c
+++ b/drivers/atm/zatm.c
@@ -1151,8 +1151,8 @@ static void eprom_get_byte(struct zatm_dev *zatm_dev, unsigned char *byte,
}
-static unsigned char eprom_try_esi(struct atm_dev *dev, unsigned short cmd,
- int offset, int swap)
+static int eprom_try_esi(struct atm_dev *dev, unsigned short cmd, int offset,
+ int swap)
{
unsigned char buf[ZEPROM_SIZE];
struct zatm_dev *zatm_dev;
--
2.16.1
^ permalink raw reply related
* RE: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Keller, Jacob E @ 2018-05-25 17:46 UTC (permalink / raw)
To: Bjorn Helgaas, Jakub Kicinski
Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, netdev@vger.kernel.org,
Sathya Perla, Felix Manlunas, alexander.duyck@gmail.com,
john.fastabend@gmail.com, Donald Dutile,
oss-drivers@netronome.com, Christoph Hellwig, Derek Chickles,
Satanand Burla, Raghu Vatsavayi, Ajit Khaparde,
Sriharsha Basavapatna, Somnath Kotur <somnath.kotur
In-Reply-To: <20180525170122.GA63280@bhelgaas-glaptop.roam.corp.google.com>
> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> Sent: Friday, May 25, 2018 10:01 AM
> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>; linux-pci@vger.kernel.org;
> netdev@vger.kernel.org; Sathya Perla <sathya.perla@broadcom.com>; Felix
> Manlunas <felix.manlunas@caviumnetworks.com>;
> alexander.duyck@gmail.com; john.fastabend@gmail.com; Keller, Jacob E
> <jacob.e.keller@intel.com>; Donald Dutile <ddutile@redhat.com>; oss-
> drivers@netronome.com; Christoph Hellwig <hch@infradead.org>; Derek
> Chickles <derek.chickles@caviumnetworks.com>; Satanand Burla
> <satananda.burla@caviumnetworks.com>; Raghu Vatsavayi
> <raghu.vatsavayi@caviumnetworks.com>; Ajit Khaparde
> <ajit.khaparde@broadcom.com>; Sriharsha Basavapatna
> <sriharsha.basavapatna@broadcom.com>; Somnath Kotur
> <somnath.kotur@broadcom.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; intel-wired-lan@lists.osuosl.org
> Subject: Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
>
> [+cc liquidio, benet, fm10k maintainers:
>
> The patch below will affect you if your driver calls
> pci_sriov_set_totalvfs(dev, 0);
>
> Previously that caused a subsequent pci_sriov_get_totalvfs() to return
> the totalVFs value from the SR-IOV capability. After this patch, it will
> return 0, which has implications for VF enablement via the sysfs
> "sriov_numvfs" file.]
>
Thanks. I don't foresee any issues with fm10k regarding this..
Thanks,
Jake
^ permalink raw reply
* [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs
From: Xin Long @ 2018-05-25 17:41 UTC (permalink / raw)
To: network dev, linux-sctp
Cc: davem, David Ahern, Eric Dumazet, Marcelo Ricardo Leitner,
Neil Horman, syzkaller
syzbot reported a rcu_sched self-detected stall on CPU which is caused
by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
value, hb_timer will get stuck there, as in its timer handler it starts
this timer again with this value, then goes to the timer handler again.
This problem is there since very beginning, and thanks to Eric for the
reproducer shared from a syzbot mail.
This patch fixes it by not allowing to set rto_min with a value below
200 msecs, which is based on TCP's, by either setsockopt or sysctl.
Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
include/net/sctp/constants.h | 1 +
net/sctp/socket.c | 10 +++++++---
net/sctp/sysctl.c | 3 ++-
3 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
index 20ff237..2ee7a7b 100644
--- a/include/net/sctp/constants.h
+++ b/include/net/sctp/constants.h
@@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 };
#define SCTP_RTO_INITIAL (3 * 1000)
#define SCTP_RTO_MIN (1 * 1000)
#define SCTP_RTO_MAX (60 * 1000)
+#define SCTP_RTO_HARD_MIN 200
#define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */
#define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ae7e7c6..6ef12c7 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval,
* be changed.
*
*/
-static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen)
+static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval,
+ unsigned int optlen)
{
struct sctp_rtoinfo rtoinfo;
struct sctp_association *asoc;
@@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne
else
rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max;
- if (rto_min)
+ if (rto_min) {
+ if (rto_min < SCTP_RTO_HARD_MIN)
+ return -EINVAL;
rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min;
- else
+ } else {
rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min;
+ }
if (rto_min > rto_max)
return -EINVAL;
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index 33ca5b7..7ec854a 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -52,6 +52,7 @@ static int rto_alpha_min = 0;
static int rto_beta_min = 0;
static int rto_alpha_max = 1000;
static int rto_beta_max = 1000;
+static int rto_hard_min = SCTP_RTO_HARD_MIN;
static unsigned long max_autoclose_min = 0;
static unsigned long max_autoclose_max =
@@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = {
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_sctp_do_rto_min,
- .extra1 = &one,
+ .extra1 = &rto_hard_min,
.extra2 = &init_net.sctp.rto_max
},
{
--
2.1.0
^ permalink raw reply related
* [bpf-next PATCH] bpf: sockhash fix race with bpf_tcp_close and map delete
From: John Fastabend @ 2018-05-25 17:37 UTC (permalink / raw)
To: ast, daniel; +Cc: netdev
syzbot reported two related splats, a use after free and null
pointer dereference, when a TCP socket is closed while the map is
also being removed.
The psock keeps a reference to all map slots that have a reference
to the sock so that when the sock is closed we can clean up any
outstanding sock{map|hash} entries. This avoids pinning a sock
forever if the map owner fails to do proper cleanup. However, the
result is we have two paths that can free an entry in the map. Even
the comment in the sock{map|hash} tear down function, sock_hash_free()
notes this:
At this point no update, lookup or delete operations can happen.
However, be aware we can still get a socket state event updates,
and data ready callbacks that reference the psock from sk_user_data.
Both removal paths omitted taking the hash bucket lock resulting
in the case where we have two references that are in the process
of being free'd.
Reported-by: syzbot+a761b81c211794fa1072@syzkaller.appspotmail.com
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
kernel/bpf/sockmap.c | 33 +++++++++++++++++++++------------
1 file changed, 21 insertions(+), 12 deletions(-)
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 52a91d8..b508141f 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -225,6 +225,16 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
kfree_rcu(l, rcu);
}
+static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
+{
+ return &htab->buckets[hash & (htab->n_buckets - 1)];
+}
+
+static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash)
+{
+ return &__select_bucket(htab, hash)->head;
+}
+
static void bpf_tcp_close(struct sock *sk, long timeout)
{
void (*close_fun)(struct sock *sk, long timeout);
@@ -268,9 +278,15 @@ static void bpf_tcp_close(struct sock *sk, long timeout)
smap_release_sock(psock, sk);
}
} else {
+ u32 hash = e->hash_link->hash;
+ struct bucket *b;
+
+ b = __select_bucket(e->htab, hash);
+ raw_spin_lock_bh(&b->lock);
hlist_del_rcu(&e->hash_link->hash_node);
smap_release_sock(psock, e->hash_link->sk);
free_htab_elem(e->htab, e->hash_link);
+ raw_spin_unlock_bh(&b->lock);
}
}
write_unlock_bh(&sk->sk_callback_lock);
@@ -2043,16 +2059,6 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
return ERR_PTR(err);
}
-static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
-{
- return &htab->buckets[hash & (htab->n_buckets - 1)];
-}
-
-static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash)
-{
- return &__select_bucket(htab, hash)->head;
-}
-
static void sock_hash_free(struct bpf_map *map)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
@@ -2069,10 +2075,12 @@ static void sock_hash_free(struct bpf_map *map)
*/
rcu_read_lock();
for (i = 0; i < htab->n_buckets; i++) {
- struct hlist_head *head = select_bucket(htab, i);
+ struct bucket *b = __select_bucket(htab, i);
+ struct hlist_head *head = &b->head;
struct hlist_node *n;
struct htab_elem *l;
+ raw_spin_lock_bh(&b->lock);
hlist_for_each_entry_safe(l, n, head, hash_node) {
struct sock *sock = l->sk;
struct smap_psock *psock;
@@ -2090,8 +2098,9 @@ static void sock_hash_free(struct bpf_map *map)
smap_release_sock(psock, sock);
}
write_unlock_bh(&sock->sk_callback_lock);
- kfree(l);
+ free_htab_elem(htab, l);
}
+ raw_spin_unlock_bh(&b->lock);
}
rcu_read_unlock();
bpf_map_area_free(htab->buckets);
^ permalink raw reply related
* Re: [PATCH bpf-next] libbpf: Install btf.h with libbpf
From: Martin KaFai Lau @ 2018-05-25 17:33 UTC (permalink / raw)
To: Andrey Ignatov; +Cc: netdev, ast, daniel, kernel-team
In-Reply-To: <20180525172313.1043567-1-rdna@fb.com>
On Fri, May 25, 2018 at 10:23:13AM -0700, Andrey Ignatov wrote:
> install_headers target should contain all headers that are part of
> libbpf. Add missing btf.h
>
> Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
^ permalink raw reply
* [PATCH bpf-next] libbpf: Install btf.h with libbpf
From: Andrey Ignatov @ 2018-05-25 17:23 UTC (permalink / raw)
To: netdev; +Cc: Andrey Ignatov, kafai, ast, daniel, kernel-team
install_headers target should contain all headers that are part of
libbpf. Add missing btf.h
Signed-off-by: Andrey Ignatov <rdna@fb.com>
---
tools/lib/bpf/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index f3fab4a..5390e77 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -189,6 +189,7 @@ install_headers:
$(call QUIET_INSTALL, headers) \
$(call do_install,bpf.h,$(prefix)/include/bpf,644); \
$(call do_install,libbpf.h,$(prefix)/include/bpf,644);
+ $(call do_install,btf.h,$(prefix)/include/bpf,644);
install: install_lib
--
2.9.5
^ permalink raw reply related
* Re: [PATCH net-next v12 0/5] Enable virtio_net to act as a standby for a passthru device
From: Michael S. Tsirkin @ 2018-05-25 17:19 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: stephen, davem, netdev, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
loseweigh, jiri, aaron.f.brown, anjali.singhai
In-Reply-To: <1527180917-39737-1-git-send-email-sridhar.samudrala@intel.com>
On Thu, May 24, 2018 at 09:55:12AM -0700, Sridhar Samudrala wrote:
> The main motivation for this patch is to enable cloud service providers
> to provide an accelerated datapath to virtio-net enabled VMs in a
> transparent manner with no/minimal guest userspace changes. This also
> enables hypervisor controlled live migration to be supported with VMs that
> have direct attached SR-IOV VF devices.
>
> Patch 1 introduces a failover module that provides a generic interface for
> paravirtual drivers to listen for netdev register/unregister/link change
> events from pci ethernet devices with the same MAC and takeover their
> datapath. The notifier and event handling code is based on the existing
> netvsc implementation.
>
> Patch 2 refactors netvsc to use the registration/notification framework
> introduced by failover module.
>
> Patch 3 introduces a net_failover driver that provides an automated
> failover mechanism to paravirtual drivers via APIs to create and destroy
> a failover master netdev and mananges a primary and standby slave netdevs
> that get registered via the generic failover infrastructure.
>
> Patch 4 introduces a new feature bit VIRTIO_NET_F_STANDBY to virtio-net
> that can be used by hypervisor to indicate that virtio_net interface
> should act as a standby for another device with the same MAC address.
>
> Patch 5 extends virtio_net to use alternate datapath when available and
> registered. When STANDBY feature is enabled, virtio_net driver uese the
> net_failover API to create an additional 'failover' netdev that acts as
> a master device and controls 2 slave devices. The original virtio_net
> netdev is registered as 'standby' netdev and a passthru/vf device with
> the same MAC gets registered as 'primary' netdev. Both 'standby' and
> 'failover' netdevs are associated with the same 'pci' device. The user
> accesses the network interface via 'failover' netdev. The 'failover'
> netdev chooses 'primary' netdev as default for transmits when it is
> available with link up and running.
>
> As this patch series is initially focusing on usecases where hypervisor
> fully controls the VM networking and the guest is not expected to directly
> configure any hardware settings, it doesn't expose all the ndo/ethtool ops
> that are supported by virtio_net at this time. To support additional usecases,
> it should be possible to enable additional ops later by caching the state
> in failover netdev and replaying when the 'primary' netdev gets registered.
>
> At the time of live migration, the hypervisor needs to unplug the VF device
> from the guest on the source host and reset the MAC filter of the VF to
> initiate failover of datapath to virtio before starting the migration. After
> the migration is completed, the destination hypervisor sets the MAC filter
> on the VF and plugs it back to the guest to switch over to VF datapath.
>
> This patch is based on the discussion initiated by Jesse on this thread.
> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
Series:
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> v12:
> - Tested live migration with virtio-net/AVF(i40evf) configured in failover
> mode while running iperf in background. Tried static ip and dhcp
> configurations using 'network' scripts and Network Manager.
> - Build tested netvsc module.
> Updates:
> - Extended generic failover module to do common functions like setting
> FAILOVER_SLAVE flag, registering rx-handler and linking to upper dev in
> the generic register/unregister handlers.
> This required adding 3 additional failover ops pre_register, pre_unregister
> and handle_frame. netvsc and net_failover drivers are updated to support
> these ops.
>
> v11:
> - Split net_failover module into 2 components.
> 1. 'failover' module that provides generic failover infrastructure
> to register a failover instance and listen for slave events.
> 2. 'net_failover' driver that provides APIs to create/destroy upper
> netdev and supports 3-netdev model used by virtio-net.
> - Added documentation
>
> v10:
> - fix net_failover_open() to update failover CARRIER correctly based on
> standby and primary states.
> - fix net_failover_handle_frame() to handle frames received on standby
> when primary is present.
> - replace netdev_upper_dev_link with netdev_master_upper_dev_link and
> handle lower dev state changes.
> - fix net_failver_create() and net_failover_register() interfaces to
> use ERR_PTR and avoid arg **
> - disable setting mac address when virtio-net in STANDBY mode
> - document exported symbols
> - added entry to MAINTAINERS file
>
> v9:
> Select NET_FAILOVER automatically when VIRTIO_NET/HYPERV_NET
> are enabled. (stephen)
>
> v8:
> - Made the failover managment routines more robust by updating the feature
> bits/other fields in the failover netdev when slave netdevs are
> registered/unregistered. (mst)
> - added support for handling vlans.
> - Limited the changes in netvsc to only use the notifier/event/lookups
> from the failover module. The slave register/unregister/link-change
> handlers are only updated to use the getbymac routine to get the
> upper netdev. There is no change in their functionality. (stephen)
> - renamed structs/function/file names to use net_failover prefix. (mst)
>
> v7
> - Rename 'bypass/active/backup' terminology with 'failover/primary/standy'
> (jiri, mst)
> - re-arranged dev_open() and dev_set_mtu() calls in the register routines
> so that they don't get called for 2-netdev model. (stephen)
> - fixed select_queue() routine to do queue selection based on VF if it is
> registered as primary. (stephen)
> - minor bugfixes
>
> v6 RFC:
> Simplified virtio_net changes by moving all the ndo_ops of the
> bypass_netdev and create/destroy of bypass_netdev to 'bypass' module.
> avoided 2 phase registration(driver + instances).
> introduced IFF_BYPASS/IFF_BYPASS_SLAVE dev->priv_flags
> replaced mutex with a spinlock
>
> v5 RFC:
> Based on Jiri's comments, moved the common functionality to a 'bypass'
> module so that the same notifier and event handlers to handle child
> register/unregister/link change events can be shared between virtio_net
> and netvsc.
> Improved error handling based on Siwei's comments.
> v4:
> - Based on the review comments on the v3 version of the RFC patch and
> Jakub's suggestion for the naming issue with 3 netdev solution,
> proposed 3 netdev in-driver bonding solution for virtio-net.
> v3 RFC:
> - Introduced 3 netdev model and pointed out a couple of issues with
> that model and proposed 2 netdev model to avoid these issues.
> - Removed broadcast/multicast optimization and only use virtio as
> backup path when VF is unplugged.
> v2 RFC:
> - Changed VIRTIO_NET_F_MASTER to VIRTIO_NET_F_BACKUP (mst)
> - made a small change to the virtio-net xmit path to only use VF datapath
> for unicasts. Broadcasts/multicasts use virtio datapath. This avoids
> east-west broadcasts to go over the PCI link.
> - added suppport for the feature bit in qemu
>
> Sridhar Samudrala (5):
> net: Introduce generic failover module
> netvsc: refactor notifier/event handling code to use the failover
> framework
> net: Introduce net_failover driver
> virtio_net: Introduce VIRTIO_NET_F_STANDBY feature bit
> virtio_net: Extend virtio to use VF datapath when available
>
> Documentation/networking/failover.rst | 18 +
> Documentation/networking/net_failover.rst | 116 +++++
> MAINTAINERS | 16 +
> drivers/net/Kconfig | 13 +
> drivers/net/Makefile | 1 +
> drivers/net/hyperv/Kconfig | 1 +
> drivers/net/hyperv/hyperv_net.h | 2 +
> drivers/net/hyperv/netvsc_drv.c | 222 ++------
> drivers/net/net_failover.c | 836 ++++++++++++++++++++++++++++++
> drivers/net/virtio_net.c | 40 +-
> include/linux/netdevice.h | 16 +
> include/net/failover.h | 36 ++
> include/net/net_failover.h | 40 ++
> include/uapi/linux/virtio_net.h | 3 +
> net/Kconfig | 13 +
> net/core/Makefile | 1 +
> net/core/failover.c | 315 +++++++++++
> 17 files changed, 1522 insertions(+), 167 deletions(-)
> create mode 100644 Documentation/networking/failover.rst
> create mode 100644 Documentation/networking/net_failover.rst
> create mode 100644 drivers/net/net_failover.c
> create mode 100644 include/net/failover.h
> create mode 100644 include/net/net_failover.h
> create mode 100644 net/core/failover.c
>
> --
> 2.14.3
^ permalink raw reply
* Re: [PATCH v2 net-next] net: stmmac: Add PPS and Flexible PPS support
From: Richard Cochran @ 2018-05-25 17:02 UTC (permalink / raw)
To: Jose Abreu
Cc: netdev, David S. Miller, Joao Pinto, Vitor Soares,
Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <fb25946a2fee8708876ff9a56d5b20f8b9224a5a.1527262324.git.joabreu@synopsys.com>
On Fri, May 25, 2018 at 04:32:52PM +0100, Jose Abreu wrote:
> +int dwmac5_pps_config(void __iomem *ioaddr, bool enable)
> +{
> + u32 val = readl(ioaddr + MAC_PPS_CONTROL);
> +
> + /* There is no way to disable fixed PPS output so we just reset
> + * the values to make sure its in fixed PPS mode */
In that case, don't try to make this appear to be programmable. Just
reset the registers in your probe or setup function unconditionally.
> + val &= ~PPSx_MASK(0);
> + val |= TRGTMODSELx(0, 0x2);
> +
> + writel(val, ioaddr + MAC_PPS_CONTROL);
> + return 0;
> +}
> +
> +int dwmac5_flex_pps_config(void __iomem *ioaddr, int index,
> + struct stmmac_pps_cfg *cfg, bool enable,
> + u32 sub_second_inc, u32 systime_flags)
> +{
> + u32 tnsec = readl(ioaddr + MAC_PPSx_TARGET_TIME_NSEC(index));
> + u32 val = readl(ioaddr + MAC_PPS_CONTROL);
> + u64 period;
> +
> + if (!cfg->available)
> + return -EINVAL;
> + if (tnsec & TRGTBUSY0)
> + return -EBUSY;
> + if (!sub_second_inc || !systime_flags)
> + return -EINVAL;
Don't add tests on the arguments like this. Instead, make sure the
caller always provides correct values.
> +
> + val &= ~PPSx_MASK(index);
> +
> + if (!enable) {
> + val |= PPSCMDx(index, 0x5);
> + writel(val, ioaddr + MAC_PPS_CONTROL);
> + return 0;
> + }
> +
> + val |= PPSCMDx(index, 0x2);
> + val |= TRGTMODSELx(index, 0x2);
> + val |= PPSEN0;
> +
> + writel(cfg->start.tv_sec, ioaddr + MAC_PPSx_TARGET_TIME_SEC(index));
> +
> + if (!(systime_flags & PTP_TCR_TSCTRLSSR))
> + cfg->start.tv_nsec = (cfg->start.tv_nsec * 1000) / 465;
> + writel(cfg->start.tv_nsec, ioaddr + MAC_PPSx_TARGET_TIME_NSEC(index));
> +
> + period = cfg->period.tv_sec * 1000000000;
> + period += cfg->period.tv_nsec;
> + struct timespec64 period;
> +
> + do_div(period, sub_second_inc);
> +
> + if (period <= 1)
> + return -EINVAL;
> +
> + writel(period - 1, ioaddr + MAC_PPSx_INTERVAL(index));
> +
> + period >>= 1;
> + if (period <= 1)
> + return -EINVAL;
> +
> + writel(period - 1, ioaddr + MAC_PPSx_WIDTH(index));
> +
> + /* Finally, activate it */
> + writel(val, ioaddr + MAC_PPS_CONTROL);
> + return 0;
> +}
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
> index 7d3a5c7..35c6d0c 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
> @@ -140,19 +140,50 @@ static int stmmac_set_time(struct ptp_clock_info *ptp,
> static int stmmac_enable(struct ptp_clock_info *ptp,
> struct ptp_clock_request *rq, int on)
> {
> - return -EOPNOTSUPP;
> + struct stmmac_priv *priv =
> + container_of(ptp, struct stmmac_priv, ptp_clock_ops);
> + struct stmmac_pps_cfg *cfg;
> + int ret = -EOPNOTSUPP;
> + unsigned long flags;
> +
> + switch (rq->type) {
> + case PTP_CLK_REQ_PEROUT:
> + cfg = &priv->pps[rq->perout.index];
> +
> + cfg->start.tv_sec = rq->perout.start.sec;
> + cfg->start.tv_nsec = rq->perout.start.nsec;
> + cfg->period.tv_sec = rq->perout.period.sec;
> + cfg->period.tv_nsec = rq->perout.period.nsec;
> +
> + spin_lock_irqsave(&priv->ptp_lock, flags);
> + ret = stmmac_flex_pps_config(priv, priv->ioaddr,
> + rq->perout.index, cfg, on,
> + priv->sub_second_inc,
> + priv->systime_flags);
> + spin_unlock_irqrestore(&priv->ptp_lock, flags);
> + break;
> + case PTP_CLK_REQ_PPS:
> + spin_lock_irqsave(&priv->ptp_lock, flags);
> + ret = stmmac_pps_config(priv, priv->ioaddr, on);
This is not what PTP_CLK_REQ_PPS is for. It only to arrange for a
trigger into the kernel's PPS sub-system.
[ Sorry that the ptp header files don't explain this in the comments.
I should really fix that. ]
> + spin_unlock_irqrestore(&priv->ptp_lock, flags);
> + break;
> + default:
> + break;
> + }
> +
> + return ret;
> }
>
> /* structure describing a PTP hardware clock */
> -static const struct ptp_clock_info stmmac_ptp_clock_ops = {
> +static struct ptp_clock_info stmmac_ptp_clock_ops = {
> .owner = THIS_MODULE,
> .name = "stmmac_ptp_clock",
> .max_adj = 62500000,
> .n_alarm = 0,
> .n_ext_ts = 0,
> - .n_per_out = 0,
> + .n_per_out = 0, /* will be overwritten in stmmac_ptp_register */
> .n_pins = 0,
> - .pps = 0,
> + .pps = 0, /* will be overwritten in stmmac_ptp_register */
Again, this is for inputting a PPS event into the kernel's PPS subsystem.
Thanks,
Richard
^ permalink raw reply
* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Bjorn Helgaas @ 2018-05-25 17:01 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Bjorn Helgaas, linux-pci, netdev, Sathya Perla, Felix Manlunas,
alexander.duyck, john.fastabend, Jacob Keller, Donald Dutile,
oss-drivers, Christoph Hellwig, Derek Chickles, Satanand Burla,
Raghu Vatsavayi, Ajit Khaparde, Sriharsha Basavapatna,
Somnath Kotur, Jeff Kirsher, intel-wired-lan
In-Reply-To: <20180525140223.GA45098@bhelgaas-glaptop.roam.corp.google.com>
[+cc liquidio, benet, fm10k maintainers:
The patch below will affect you if your driver calls
pci_sriov_set_totalvfs(dev, 0);
Previously that caused a subsequent pci_sriov_get_totalvfs() to return
the totalVFs value from the SR-IOV capability. After this patch, it will
return 0, which has implications for VF enablement via the sysfs
"sriov_numvfs" file.]
On Fri, May 25, 2018 at 09:02:23AM -0500, Bjorn Helgaas wrote:
> On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote:
> > Hi Bjorn!
> >
> > On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote:
> > > On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:
> > > > Some user space depends on enabling sriov_totalvfs number of VFs
> > > > to not fail, e.g.:
> > > >
> > > > $ cat .../sriov_totalvfs > .../sriov_numvfs
> > > >
> > > > For devices which VF support depends on loaded FW we have the
> > > > pci_sriov_{g,s}et_totalvfs() API. However, this API uses 0 as
> > > > a special "unset" value, meaning drivers can't limit sriov_totalvfs
> > > > to 0. Remove the special values completely and simply initialize
> > > > driver_max_VFs to total_VFs. Then always use driver_max_VFs.
> > > > Add a helper for drivers to reset the VF limit back to total.
> > >
> > > I still can't really make sense out of the changelog.
> > >
> > > I think part of the reason it's confusing is because there are two
> > > things going on:
> > >
> > > 1) You want this:
> > >
> > > pci_sriov_set_totalvfs(dev, 0);
> > > x = pci_sriov_get_totalvfs(dev)
> > >
> > > to return 0 instead of total_VFs. That seems to connect with
> > > your subject line. It means "sriov_totalvfs" in sysfs could be
> > > 0, but I don't know how that is useful (I'm sure it is; just
> > > educate me :))
> >
> > Let me just quote the bug report that got filed on our internal bug
> > tracker :)
> >
> > When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes
> > errors because Juju gets the sriov_totalvfs for SR-IOV-capable device
> > then tries to set that as the sriov_numvfs parameter.
> >
> > For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0,
> > but it's set to max. When FW is switched to flower*, the correct
> > sriov_totalvfs value is presented.
> >
> > * flower is a project name
>
> From the point of view of the PCI core (which knows nothing about
> device firmware and relies on the architected config space described
> by the PCIe spec), this sounds like an erratum: with some firmware
> installed, the device is not capable of SR-IOV, but still advertises
> an SR-IOV capability with "TotalVFs > 0".
>
> Regardless of whether that's an erratum, we do allow PF drivers to use
> pci_sriov_set_totalvfs() to limit the number of VFs that may be
> enabled by writing to the PF's "sriov_numvfs" sysfs file.
>
> But the current implementation does not allow a PF driver to limit VFs
> to 0, and that does seem nonsensical.
>
> > My understanding is OpenStack uses sriov_totalvfs to determine how many
> > VFs can be enabled, looks like this is the code:
> >
> > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n464
> >
> > > 2) You're adding the pci_sriov_reset_totalvfs() interface. I'm not
> > > sure what you intend for this. Is *every* driver supposed to
> > > call it in .remove()? Could/should this be done in the core
> > > somehow instead of depending on every driver?
> >
> > Good question, I was just thinking yesterday we may want to call it
> > from the core, but I don't think it's strictly necessary nor always
> > sufficient (we may reload FW without re-probing).
> >
> > We have a device which supports different number of VFs based on the FW
> > loaded. Some legacy FWs does not inform the driver how many VFs it can
> > support, because it supports max. So the flow in our driver is this:
> >
> > load_fw(dev);
> > ...
> > max_vfs = ask_fw_for_max_vfs(dev);
> > if (max_vfs >= 0)
> > return pci_sriov_set_totalvfs(dev, max_vfs);
> > else /* FW didn't tell us, assume max */
> > return pci_sriov_reset_totalvfs(dev);
> >
> > We also reset the max on device remove, but that's not strictly
> > necessary.
> >
> > Other users of pci_sriov_set_totalvfs() always know the value to set
> > the total to (either always get it from FW or it's a constant).
> >
> > If you prefer we can work out the correct max for those legacy cases in
> > the driver as well, although it seemed cleaner to just ask the core,
> > since it already has total_VFs value handy :)
> >
> > > I'm also having a hard time connecting your user-space command example
> > > with the rest of this. Maybe it will make more sense to me tomorrow
> > > after some coffee.
> >
> > OpenStack assumes it will always be able to set sriov_numvfs to
> > sriov_totalvfs, see this 'if':
> >
> > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n512
>
> Thanks for educating me. I think there are two issues here that we
> can separate. I extracted the patch below for the first.
>
> The second is the question of resetting driver_max_VFs. I think we
> currently have a general issue in the core:
>
> - load PF driver 1
> - driver calls pci_sriov_set_totalvfs() to reduce driver_max_VFs
> - unload PF driver 1
> - load PF driver 2
>
> Now driver_max_VFs is still stuck at the lower value set by driver 1.
> I don't think that's the way this should work.
>
> I guess this is partly a consequence of setting driver_max_VFs in
> sriov_init(), which is called before driver attach and should only
> depend on hardware characteristics, so it is related to the patch
> below. But I think we should fix it in general, not just for
> netronome.
>
>
> commit 4a338bc6f94b9ad824ac944f5dfc249d6838719c
> Author: Jakub Kicinski <jakub.kicinski@netronome.com>
> Date: Fri May 25 08:18:34 2018 -0500
>
> PCI/IOV: Allow PF drivers to limit total_VFs to 0
>
> Some SR-IOV PF drivers implement .sriov_configure(), which allows
> user-space to enable VFs by writing the desired number of VFs to the sysfs
> "sriov_numvfs" file (see sriov_numvfs_store()).
>
> The PCI core limits the number of VFs to the TotalVFs advertised by the
> device in its SR-IOV capability. The PF driver can limit the number of VFs
> to even fewer (it may have pre-allocated data structures or knowledge of
> device limitations) by calling pci_sriov_set_totalvfs(), but previously it
> could not limit the VFs to 0.
>
> Change pci_sriov_get_totalvfs() so it always respects the VF limit imposed
> by the PF driver, even if the limit is 0.
>
> This sequence:
>
> pci_sriov_set_totalvfs(dev, 0);
> x = pci_sriov_get_totalvfs(dev);
>
> previously set "x" to TotalVFs from the SR-IOV capability. Now it will set
> "x" to 0.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 192b82898a38..d0d73dbbd5ca 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -469,6 +469,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
> iov->nres = nres;
> iov->ctrl = ctrl;
> iov->total_VFs = total;
> + iov->driver_max_VFs = total;
> pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
> iov->pgsz = pgsz;
> iov->self = dev;
> @@ -827,10 +828,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
> if (!dev->is_physfn)
> return 0;
>
> - if (dev->sriov->driver_max_VFs)
> - return dev->sriov->driver_max_VFs;
> -
> - return dev->sriov->total_VFs;
> + return dev->sriov->driver_max_VFs;
> }
> EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
>
^ permalink raw reply
* Re: [PATCH net-next 6/7] net: bridge: Notify about bridge VLANs
From: Petr Machata @ 2018-05-25 17:00 UTC (permalink / raw)
To: Vivien Didelot
Cc: netdev, devel, bridge, jiri, idosch, davem, razvan.stefanescu,
gregkh, stephen, andrew, f.fainelli, nikolay
In-Reply-To: <877enrzp42.fsf@weeman.i-did-not-set--mail-host-address--so-tickle-me>
Vivien Didelot <vivien.didelot@savoirfairelinux.com> writes:
>> + } else {
>> + err = br_switchdev_port_obj_add(dev, v->vid, flags);
>> + if (err && err != -EOPNOTSUPP)
>> + goto out;
>> }
>
> Except that br_switchdev_port_obj_add taking vid and flags arguments
> seems confusing to me, the change looks good:
I'm not sure what you're aiming at. Both VID and flags are sent with the
notification, so they need to be passed on to the function somehow. Do
you have a counterproposal for the API?
Thanks,
Petr
^ permalink raw reply
* Re: [PATCH net-next 1/7] net: bridge: Extract boilerplate around switchdev_port_obj_*()
From: Petr Machata @ 2018-05-25 16:56 UTC (permalink / raw)
To: Vivien Didelot
Cc: netdev, devel, bridge, jiri, idosch, davem, razvan.stefanescu,
gregkh, stephen, andrew, f.fainelli, nikolay
In-Reply-To: <87d0xjzpf2.fsf@weeman.i-did-not-set--mail-host-address--so-tickle-me>
Vivien Didelot <vivien.didelot@savoirfairelinux.com> writes:
> Hi Petr,
>
> Petr Machata <petrm@mellanox.com> writes:
>
>> -static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
>> - u16 vid, u16 flags)
>> +static int br_switchdev_port_obj_add(struct net_device *dev, u16 vid, u16 flags)
>> {
>> struct switchdev_obj_port_vlan v = {
>> .obj.orig_dev = dev,
>> @@ -89,12 +88,29 @@ static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
>> .vid_begin = vid,
>> .vid_end = vid,
>> };
>> - int err;
>>
>> + return switchdev_port_obj_add(dev, &v.obj);
>> +}
>> +
>> +static int br_switchdev_port_obj_del(struct net_device *dev, u16 vid)
>> +{
>> + struct switchdev_obj_port_vlan v = {
>> + .obj.orig_dev = dev,
>> + .obj.id = SWITCHDEV_OBJ_ID_PORT_VLAN,
>> + .vid_begin = vid,
>> + .vid_end = vid,
>> + };
>> +
>> + return switchdev_port_obj_del(dev, &v.obj);
>> +}
>
> Shouldn't they be br_switchdev_port_vlan_add (or similar) implemented in
> net/bridge/br_switchdev.c instead, since they are VLAN specific?
(You mean switchdev-specific?)
This logic was in br_vlan.c before as well, so it's natural to think
about the functions as helpers of VLAN module. I can move to
br_switchdev.c if you think that's the better place.
Thanks,
Petr
^ permalink raw reply
* Re: [PATCH bpf-next v2 0/3] bpf: add boot parameters for sysctl knobs
From: Eugene Syromiatnikov @ 2018-05-25 16:50 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Jesper Dangaard Brouer, netdev, linux-kernel, linux-doc,
Kees Cook, Kai-Heng Feng, Daniel Borkmann, Alexei Starovoitov,
Jonathan Corbet, Jiri Olsa
In-Reply-To: <20180524233449.ga664pzexrkzepfv@ast-mbp>
On Thu, May 24, 2018 at 04:34:51PM -0700, Alexei Starovoitov wrote:
> On Thu, May 24, 2018 at 09:41:08AM +0200, Jesper Dangaard Brouer wrote:
> > On Wed, 23 May 2018 15:02:45 -0700
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > > On Wed, May 23, 2018 at 02:18:19PM +0200, Eugene Syromiatnikov wrote:
> > > > Some BPF sysctl knobs affect the loading of BPF programs, and during
> > > > system boot/init stages these sysctls are not yet configured.
> > > > A concrete example is systemd, that has implemented loading of BPF
> > > > programs.
> > > >
> > > > Thus, to allow controlling these setting at early boot, this patch set
> > > > adds the ability to change the default setting of these sysctl knobs
> > > > as well as option to override them via a boot-time kernel parameter
> > > > (in order to avoid rebuilding kernel each time a need of changing these
> > > > defaults arises).
> > > >
> > > > The sysctl knobs in question are kernel.unprivileged_bpf_disable,
> > > > net.core.bpf_jit_harden, and net.core.bpf_jit_kallsyms.
> > >
> > > - systemd is root. today it only uses cgroup-bpf progs which require root,
> > > so disabling unpriv during boot time makes no difference to systemd.
> > > what is the actual reason to present time?
systemd also runs a lot of code, some of which is unprivileged.
> > > - say in the future systemd wants to use so_reuseport+bpf for faster
> > > networking. With unpriv disable during boot, it will force systemd
> > > to do such networking from root, which will lower its security barrier.
No, it will force systemd not to use SO_REUSEPORT BPF.
> > > - bpf_jit_kallsyms sysctl has immediate effect on loaded programs.
> > > Flipping it during the boot or right after or any time after
> > > is the same thing. Why add such boot flag then?
Well, that one was for completeness.
> > > - jit_harden can be turned on by systemd. so turning it during the boot
> > > will make systemd progs to be constant blinded.
> > > Constant blinding protects kernel from unprivileged JIT spraying.
> > > Are you worried that systemd will attack the kernel with JIT spraying?
I'm worried that systemd can be exploited for a JIT spraying attack.
Another thing I'm concerned with is that the generated code is different,
which introduces additional complication during debugging.
> > I think you are missing that, we want the ability to change these
> > defaults in-order to avoid depending on /etc/sysctl.conf settings, and
> > that the these sysctl.conf setting happen too late.
>
> What does it mean 'happens too late' ?
> Too late for what?
> sysctl.conf has plenty of system critical knobs like
> kernel.perf_event_paranoid, kernel.core_pattern, etc
> The behavior of the host is drastically different after sysctl config
> is applied.
>
> > For example with jit_harden, there will be a difference between the
> > loaded BPF program that got loaded at boot-time with systemd (no
> > constant blinding) and when someone reloads that systemd service after
> > /etc/sysctl.conf have been evaluated and setting bpf_jit_harden (now
> > slower due to constant blinding). This is inconsistent behavior.
>
> net.core.bpf_jit_harden can be flipped back and forth at run-time,
> so bpf progs before and after will be either blinded or not.
> I don't see any inconsistency.
That can't be the reason to maintain that inconsistency.
^ permalink raw reply
* Re: [PATCH bpf-next] selftests/bpf: missing headers test_lwt_seg6local
From: Daniel Borkmann @ 2018-05-25 16:39 UTC (permalink / raw)
To: Alexei Starovoitov, Mathieu Xhonneux; +Cc: netdev, ys114321
In-Reply-To: <20180525162312.dadn2wtcp4bkf2la@ast-mbp>
On 05/25/2018 06:23 PM, Alexei Starovoitov wrote:
> On Fri, May 25, 2018 at 12:20:36PM +0100, Mathieu Xhonneux wrote:
>> Previous patch "selftests/bpf: test for seg6local End.BPF action" lacks
>> some UAPI headers in tools/.
>>
>> clang -I. -I./include/uapi -I../../../include/uapi -idirafter
>> /usr/local/include -idirafter
>> /data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
>> -idirafter /usr/include -Wno-compare-distinct-pointer-types \
>> -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - | \
>> llc -march=bpf -mcpu=generic -filetype=obj -o
>> [...]/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
>> test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
>> ^~~~~~~~~~~~~~~~~~~~
>> 1 error generated.
>> make: Leaving directory
>> `/data/users/yhs/work/net-next/tools/testing/selftests/bpf'
>>
>> Reported-by: Y Song <ys114321@gmail.com>
>> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
>> ---
>> .../selftests/bpf/include/uapi/linux/seg6.h | 55 +++++++++++++++
>> .../selftests/bpf/include/uapi/linux/seg6_local.h | 80 ++++++++++++++++++++++
>> 2 files changed, 135 insertions(+)
>> create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6.h
>> create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h
>>
>> diff --git a/tools/testing/selftests/bpf/include/uapi/linux/seg6.h b/tools/testing/selftests/bpf/include/uapi/linux/seg6.h
>
> hmm. why to selftest?
> Shouldn't they be added to tools/include/uapi/linux/ instead?
Yes, should definitely go there to tools include infrastructure.
^ permalink raw reply
* Re: [PATCH 1/7] core, dma-direct: add a flag 32-bit dma limits
From: Greg Kroah-Hartman @ 2018-05-25 16:35 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Thomas Gleixner, Ingo Molnar, Tony Luck, Fenghua Yu, x86, iommu,
linux-kernel, linux-ia64, netdev
In-Reply-To: <20180525162307.GB28900@lst.de>
On Fri, May 25, 2018 at 06:23:07PM +0200, Christoph Hellwig wrote:
> On Fri, May 25, 2018 at 04:50:12PM +0200, Greg Kroah-Hartman wrote:
> > On Fri, May 25, 2018 at 04:35:06PM +0200, Christoph Hellwig wrote:
> > > Various PCI bridges (VIA PCI, Xilinx PCIe) limit DMA to only 32-bits
> > > even if the device itself supports more. Add a single bit flag to
> > > struct device (to be moved into the dma extension once we around it)
> >
> > "once we around it"? I don't understand, sorry.
>
> Should be "once we get around it", which in proper grammar should
> probably be "once we get to it". Anyway, the point is that right
> now struct device is bloated with a lot of fields for dma/iommu
> purposes and we need to clean this up. It's been on my TODO list
> for a while.
Ah, makes sense, that's fine with me, I'd love to see that get cleaned
up.
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH net-next 0/7] net: bridge: Notify about bridge VLANs
From: Vivien Didelot @ 2018-05-25 16:31 UTC (permalink / raw)
To: Florian Fainelli, Petr Machata, netdev, devel, bridge
Cc: jiri, idosch, davem, razvan.stefanescu, gregkh, stephen, andrew,
nikolay
In-Reply-To: <1ea85ef2-0feb-dec0-ae25-68f4b42543b2@gmail.com>
Hi Florian,
Florian Fainelli <f.fainelli@gmail.com> writes:
> Andrew, Vivien, if the following hunks get applied are we possibly
> breaking mv88e6xxx? This is the use case that is really missing IMHO at
> the moment in DSA: we cannot control the VLAN membership and attributes
> of the CPU port(s), so either we make it always tagged in every VLAN
> (not great), or we introduce the ability to target the CPU port which is
> what Petr's patches + mine do.
Your change looks good to me. mv88e6xxx programs the DSA and CPU ports'
membership as "unmodified" (i.e. "as-is", which is a Marvell feature),
so that shouldn't change the current behavior.
Thanks,
Vivien
^ permalink raw reply
* Re: [PATCH v4 bpf-next 1/6] bpf: Define cgroup_bpf_enabled for CONFIG_CGROUP_BPF=n
From: Alexei Starovoitov @ 2018-05-25 16:29 UTC (permalink / raw)
To: Andrey Ignatov; +Cc: netdev, davem, kafai, ast, daniel, kernel-team
In-Reply-To: <677e2ddff0a1ff3d19ceb897e68f86e0246526a7.1527263217.git.rdna@fb.com>
On Fri, May 25, 2018 at 08:55:22AM -0700, Andrey Ignatov wrote:
> Static key is used to enable/disable cgroup-bpf related code paths at
> run time.
>
> Though it's not defined when cgroup-bpf is disabled at compile time,
> i.e. CONFIG_CGROUP_BPF=n, and if some code wants to use it, it has to do
> this:
>
> #ifdef CONFIG_CGROUP_BPF
> if (cgroup_bpf_enabled) {
> /* ... some work ... */
> }
> #endif
>
> This code can be simplified by setting cgroup_bpf_enabled to 0 for
> CONFIG_CGROUP_BPF=n case:
>
> if (cgroup_bpf_enabled) {
> /* ... some work ... */
> }
>
> And it aligns well with existing BPF_CGROUP_RUN_PROG_* macros that
> defined for both states of CONFIG_CGROUP_BPF.
>
> Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply
* Re: [PATCH] [RFC] bpf: tracing: new helper bpf_get_current_cgroup_ino
From: Y Song @ 2018-05-25 16:28 UTC (permalink / raw)
To: Alban Crequy
Cc: Alexei Starovoitov, netdev, LKML, Linux Containers, cgroups,
Tejun Heo, Iago López Galeiras
In-Reply-To: <CADZs7q4xd1CwGULvYe2-Y2aYpwhiiw3upF=mAK0ve_-jrk1yFg@mail.gmail.com>
On Fri, May 25, 2018 at 8:21 AM, Alban Crequy <alban@kinvolk.io> wrote:
> On Wed, May 23, 2018 at 4:34 AM Y Song <ys114321@gmail.com> wrote:
>
>> I did a quick prototyping and the above interface seems working fine.
>
> Thanks! I gave your kernel patch & userspace program a try and it works for
> me on cgroup-v2.
>
> Also, I found out how to get my containers to use both cgroup-v1 and
> cgroup-v2 (by enabling systemd's hybrid cgroup mode and docker's
> '--exec-opt native.cgroupdriver=systemd' option). So I should be able to
> use the BPF helper function without having to add support for all the
> cgroup-v1 hierarchies.
Great. Will submit a formal patch soon.
>
>> The kernel change:
>> ===============
>
>> [yhs@localhost bpf-next]$ git diff
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 97446bbe2ca5..669b7383fddb 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1976,7 +1976,8 @@ union bpf_attr {
>> FN(fib_lookup), \
>> FN(sock_hash_update), \
>> FN(msg_redirect_hash), \
>> - FN(sk_redirect_hash),
>> + FN(sk_redirect_hash), \
>> + FN(get_current_cgroup_id),
>
>> /* integer value in 'imm' field of BPF_CALL instruction selects which
> helper
>> * function eBPF program intends to call
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index ce2cbbff27e4..e11e3298f911 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -493,6 +493,21 @@ static const struct bpf_func_proto
>> bpf_current_task_under_cgroup_proto = {
>> .arg2_type = ARG_ANYTHING,
>> };
>
>> +BPF_CALL_0(bpf_get_current_cgroup_id)
>> +{
>> + struct cgroup *cgrp = task_dfl_cgroup(current);
>> + if (!cgrp)
>> + return -EINVAL;
>> +
>> + return cgrp->kn->id.id;
>> +}
>> +
>> +static const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
>> + .func = bpf_get_current_cgroup_id,
>> + .gpl_only = false,
>> + .ret_type = RET_INTEGER,
>> +};
>> +
>> BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
>> const void *, unsafe_ptr)
>> {
>> @@ -563,6 +578,8 @@ tracing_func_proto(enum bpf_func_id func_id, const
>> struct bpf_prog *prog)
>> return &bpf_get_prandom_u32_proto;
>> case BPF_FUNC_probe_read_str:
>> return &bpf_probe_read_str_proto;
>> + case BPF_FUNC_get_current_cgroup_id:
>> + return &bpf_get_current_cgroup_id_proto;
>> default:
>> return NULL;
>> }
>
>> The following program can be used to print out a cgroup id given a cgroup
> path.
>> [yhs@localhost cg]$ cat get_cgroup_id.c
>> #define _GNU_SOURCE
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>
>> int main(int argc, char **argv)
>> {
>> int dirfd, err, flags, mount_id, fhsize;
>> struct file_handle *fhp;
>> char *pathname;
>
>> if (argc != 2) {
>> printf("usage: %s <cgroup_path>\n", argv[0]);
>> return 1;
>> }
>
>> pathname = argv[1];
>> dirfd = AT_FDCWD;
>> flags = 0;
>
>> fhsize = sizeof(*fhp);
>> fhp = malloc(fhsize);
>> if (!fhp)
>> return 1;
>
>> err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
>> if (err >= 0) {
>> printf("error\n");
>> return 1;
>> }
>
>> fhsize = sizeof(struct file_handle) + fhp->handle_bytes;
>> fhp = realloc(fhp, fhsize);
>> if (!fhp)
>> return 1;
>
>> err = name_to_handle_at(dirfd, pathname, fhp, &mount_id, flags);
>> if (err < 0)
>> perror("name_to_handle_at");
>> else {
>> int i;
>
>> printf("dir = %s, mount_id = %d\n", pathname, mount_id);
>> printf("handle_bytes = %d, handle_type = %d\n", fhp->handle_bytes,
>> fhp->handle_type);
>> if (fhp->handle_bytes != 8)
>> return 1;
>
>> printf("cgroup_id = 0x%llx\n", *(unsigned long long
> *)fhp->f_handle);
>> }
>
>> return 0;
>> }
>> [yhs@localhost cg]$
>
>> Given a cgroup path, the user can get cgroup_id and use it in their bpf
>> program for filtering purpose.
>
>> I run a simple program t.c
>> int main() { while(1) sleep(1); return 0; }
>> in the cgroup v2 directory /home/yhs/tmp/yhs
>> none on /home/yhs/tmp type cgroup2 (rw,relatime,seclabel)
>
>> $ ./get_cgroup_id /home/yhs/tmp/yhs
>> dir = /home/yhs/tmp/yhs, mount_id = 124
>> handle_bytes = 8, handle_type = 1
>> cgroup_id = 0x1000006b2
>
>> // the below command to get cgroup_id from the kernel for the
>> // process compiled with t.c and ran under /home/yhs/tmp/yhs:
>> $ sudo ./trace.py -p 4067 '__x64_sys_nanosleep "cgid = %llx", $cgid'
>> PID TID COMM FUNC -
>> 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2
>> 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2
>> 4067 4067 a.out __x64_sys_nanosleep cgid = 1000006b2
>> ^C[yhs@localhost tools]$
>
>> The kernel and user space cgid matches. Will provide a
>> formal patch later.
>
>
>
>
>> On Mon, May 21, 2018 at 5:24 PM, Y Song <ys114321@gmail.com> wrote:
>> > On Mon, May 21, 2018 at 9:26 AM, Alexei Starovoitov
>> > <alexei.starovoitov@gmail.com> wrote:
>> >> On Sun, May 13, 2018 at 07:33:18PM +0200, Alban Crequy wrote:
>> >>>
>> >>> +BPF_CALL_2(bpf_get_current_cgroup_ino, u32, hierarchy, u64, flags)
>> >>> +{
>> >>> + // TODO: pick the correct hierarchy instead of the mem
> controller
>> >>> + struct cgroup *cgrp = task_cgroup(current, memory_cgrp_id);
>> >>> +
>> >>> + if (unlikely(!cgrp))
>> >>> + return -EINVAL;
>> >>> + if (unlikely(hierarchy))
>> >>> + return -EINVAL;
>> >>> + if (unlikely(flags))
>> >>> + return -EINVAL;
>> >>> +
>> >>> + return cgrp->kn->id.ino;
>> >>
>> >> ino only is not enough to identify cgroup. It needs generation number
> too.
>> >> I don't quite see how hierarchy and flags can be used in the future.
>> >> Also why limit it to memcg?
>> >>
>> >> How about something like this instead:
>> >>
>> >> BPF_CALL_2(bpf_get_current_cgroup_id)
>> >> {
>> >> struct cgroup *cgrp = task_dfl_cgroup(current);
>> >>
>> >> return cgrp->kn->id.id;
>> >> }
>> >> The user space can use fhandle api to get the same 64-bit id.
>> >
>> > I think this should work. This will also be useful to bcc as user
>> > space can encode desired id
>> > in the bpf program and compared that id to the current cgroup id, so we
> can have
>> > cgroup level tracing (esp. stat collection) support. To cope with
>> > cgroup hierarchy, user can use
>> > cgroup-array based approach or explicitly compare against multiple
> cgroup id's.
^ permalink raw reply
* Re: [PATCH bpf-next] selftests/bpf: missing headers test_lwt_seg6local
From: Alexei Starovoitov @ 2018-05-25 16:23 UTC (permalink / raw)
To: Mathieu Xhonneux; +Cc: netdev, ys114321, daniel
In-Reply-To: <20180525112036.4768-1-m.xhonneux@gmail.com>
On Fri, May 25, 2018 at 12:20:36PM +0100, Mathieu Xhonneux wrote:
> Previous patch "selftests/bpf: test for seg6local End.BPF action" lacks
> some UAPI headers in tools/.
>
> clang -I. -I./include/uapi -I../../../include/uapi -idirafter
> /usr/local/include -idirafter
> /data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
> -idirafter /usr/include -Wno-compare-distinct-pointer-types \
> -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - | \
> llc -march=bpf -mcpu=generic -filetype=obj -o
> [...]/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
> test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
> ^~~~~~~~~~~~~~~~~~~~
> 1 error generated.
> make: Leaving directory
> `/data/users/yhs/work/net-next/tools/testing/selftests/bpf'
>
> Reported-by: Y Song <ys114321@gmail.com>
> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
> ---
> .../selftests/bpf/include/uapi/linux/seg6.h | 55 +++++++++++++++
> .../selftests/bpf/include/uapi/linux/seg6_local.h | 80 ++++++++++++++++++++++
> 2 files changed, 135 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6.h
> create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h
>
> diff --git a/tools/testing/selftests/bpf/include/uapi/linux/seg6.h b/tools/testing/selftests/bpf/include/uapi/linux/seg6.h
hmm. why to selftest?
Shouldn't they be added to tools/include/uapi/linux/ instead?
^ permalink raw reply
* Re: [PATCH 1/7] core, dma-direct: add a flag 32-bit dma limits
From: Christoph Hellwig @ 2018-05-25 16:23 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Fenghua Yu, Tony Luck, linux-ia64-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, x86-DgEjT+Ai2ygdnm+yROfE0A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Ingo Molnar,
Thomas Gleixner, Christoph Hellwig
In-Reply-To: <20180525145012.GA3863-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
On Fri, May 25, 2018 at 04:50:12PM +0200, Greg Kroah-Hartman wrote:
> On Fri, May 25, 2018 at 04:35:06PM +0200, Christoph Hellwig wrote:
> > Various PCI bridges (VIA PCI, Xilinx PCIe) limit DMA to only 32-bits
> > even if the device itself supports more. Add a single bit flag to
> > struct device (to be moved into the dma extension once we around it)
>
> "once we around it"? I don't understand, sorry.
Should be "once we get around it", which in proper grammar should
probably be "once we get to it". Anyway, the point is that right
now struct device is bloated with a lot of fields for dma/iommu
purposes and we need to clean this up. It's been on my TODO list
for a while.
^ permalink raw reply
* Re: [PATCH bpf-next] selftests/bpf: missing headers test_lwt_seg6local
From: Y Song @ 2018-05-25 16:22 UTC (permalink / raw)
To: Mathieu Xhonneux; +Cc: netdev, Daniel Borkmann, Alexei Starovoitov
In-Reply-To: <CAH3MdRXW_nPi9Hup_a4vMRZx0S=eTuVxO8XSpdDTFr0zzXWVHg@mail.gmail.com>
On Fri, May 25, 2018 at 9:16 AM, Y Song <ys114321@gmail.com> wrote:
> On Fri, May 25, 2018 at 4:20 AM, Mathieu Xhonneux <m.xhonneux@gmail.com> wrote:
>> Previous patch "selftests/bpf: test for seg6local End.BPF action" lacks
>> some UAPI headers in tools/.
>>
>> clang -I. -I./include/uapi -I../../../include/uapi -idirafter
>> /usr/local/include -idirafter
>> /data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
>> -idirafter /usr/include -Wno-compare-distinct-pointer-types \
>> -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - | \
>> llc -march=bpf -mcpu=generic -filetype=obj -o
>> [...]/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
>> test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
>> ^~~~~~~~~~~~~~~~~~~~
>> 1 error generated.
>> make: Leaving directory
>> `/data/users/yhs/work/net-next/tools/testing/selftests/bpf'
>>
>> Reported-by: Y Song <ys114321@gmail.com>
>> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
>> ---
>> .../selftests/bpf/include/uapi/linux/seg6.h | 55 +++++++++++++++
>> .../selftests/bpf/include/uapi/linux/seg6_local.h | 80 ++++++++++++++++++++++
>> 2 files changed, 135 insertions(+)
>> create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6.h
>> create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h
>
> Thanks for fixing the issue.
>
> Acked-by: Y Song <ys114321@gmail.com>
Although it fixed the issue, the file is placed in
tools/testing/selftests/bpf/include/uapi/linux
directory. Considering the file is really coming from
linux/include/uapi/linux directory, should it
be placed in tools/include/uapi/linux directory instead?
^ permalink raw reply
* Re: [PATCH bpf-next] selftests/bpf: missing headers test_lwt_seg6local
From: Song Liu @ 2018-05-25 16:20 UTC (permalink / raw)
To: Mathieu Xhonneux; +Cc: netdev, ys114321, daniel, alexei.starovoitov
In-Reply-To: <20180525112036.4768-1-m.xhonneux@gmail.com>
On Fri, May 25, 2018 at 4:20 AM, Mathieu Xhonneux <m.xhonneux@gmail.com> wrote:
> Previous patch "selftests/bpf: test for seg6local End.BPF action" lacks
> some UAPI headers in tools/.
>
> clang -I. -I./include/uapi -I../../../include/uapi -idirafter
> /usr/local/include -idirafter
> /data/users/yhs/work/llvm/build/install/lib/clang/7.0.0/include
> -idirafter /usr/include -Wno-compare-distinct-pointer-types \
> -O2 -target bpf -emit-llvm -c test_lwt_seg6local.c -o - | \
> llc -march=bpf -mcpu=generic -filetype=obj -o
> [...]/net-next/tools/testing/selftests/bpf/test_lwt_seg6local.o
> test_lwt_seg6local.c:4:10: fatal error: 'linux/seg6_local.h' file not found
> ^~~~~~~~~~~~~~~~~~~~
> 1 error generated.
> make: Leaving directory
> `/data/users/yhs/work/net-next/tools/testing/selftests/bpf'
>
> Reported-by: Y Song <ys114321@gmail.com>
> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
> ---
> .../selftests/bpf/include/uapi/linux/seg6.h | 55 +++++++++++++++
> .../selftests/bpf/include/uapi/linux/seg6_local.h | 80 ++++++++++++++++++++++
> 2 files changed, 135 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6.h
> create mode 100644 tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h
>
> diff --git a/tools/testing/selftests/bpf/include/uapi/linux/seg6.h b/tools/testing/selftests/bpf/include/uapi/linux/seg6.h
> new file mode 100644
> index 000000000000..286e8d6a8e98
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/include/uapi/linux/seg6.h
> @@ -0,0 +1,55 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/*
> + * SR-IPv6 implementation
> + *
> + * Author:
> + * David Lebrun <david.lebrun@uclouvain.be>
> + *
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _UAPI_LINUX_SEG6_H
> +#define _UAPI_LINUX_SEG6_H
> +
> +#include <linux/types.h>
> +#include <linux/in6.h> /* For struct in6_addr. */
> +
> +/*
> + * SRH
> + */
> +struct ipv6_sr_hdr {
> + __u8 nexthdr;
> + __u8 hdrlen;
> + __u8 type;
> + __u8 segments_left;
> + __u8 first_segment; /* Represents the last_entry field of SRH */
> + __u8 flags;
> + __u16 tag;
> +
> + struct in6_addr segments[0];
> +};
> +
> +#define SR6_FLAG1_PROTECTED (1 << 6)
> +#define SR6_FLAG1_OAM (1 << 5)
> +#define SR6_FLAG1_ALERT (1 << 4)
> +#define SR6_FLAG1_HMAC (1 << 3)
> +
> +#define SR6_TLV_INGRESS 1
> +#define SR6_TLV_EGRESS 2
> +#define SR6_TLV_OPAQUE 3
> +#define SR6_TLV_PADDING 4
> +#define SR6_TLV_HMAC 5
> +
> +#define sr_has_hmac(srh) ((srh)->flags & SR6_FLAG1_HMAC)
> +
> +struct sr6_tlv {
> + __u8 type;
> + __u8 len;
> + __u8 data[0];
> +};
> +
> +#endif
> diff --git a/tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h b/tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h
> new file mode 100644
> index 000000000000..edc138bdc56d
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/include/uapi/linux/seg6_local.h
> @@ -0,0 +1,80 @@
> +/*
> + * SR-IPv6 implementation
> + *
> + * Author:
> + * David Lebrun <david.lebrun@uclouvain.be>
> + *
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _UAPI_LINUX_SEG6_LOCAL_H
> +#define _UAPI_LINUX_SEG6_LOCAL_H
> +
> +#include <linux/seg6.h>
> +
> +enum {
> + SEG6_LOCAL_UNSPEC,
> + SEG6_LOCAL_ACTION,
> + SEG6_LOCAL_SRH,
> + SEG6_LOCAL_TABLE,
> + SEG6_LOCAL_NH4,
> + SEG6_LOCAL_NH6,
> + SEG6_LOCAL_IIF,
> + SEG6_LOCAL_OIF,
> + SEG6_LOCAL_BPF,
> + __SEG6_LOCAL_MAX,
> +};
> +#define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1)
> +
> +enum {
> + SEG6_LOCAL_ACTION_UNSPEC = 0,
> + /* node segment */
> + SEG6_LOCAL_ACTION_END = 1,
> + /* adjacency segment (IPv6 cross-connect) */
> + SEG6_LOCAL_ACTION_END_X = 2,
> + /* lookup of next seg NH in table */
> + SEG6_LOCAL_ACTION_END_T = 3,
> + /* decap and L2 cross-connect */
> + SEG6_LOCAL_ACTION_END_DX2 = 4,
> + /* decap and IPv6 cross-connect */
> + SEG6_LOCAL_ACTION_END_DX6 = 5,
> + /* decap and IPv4 cross-connect */
> + SEG6_LOCAL_ACTION_END_DX4 = 6,
> + /* decap and lookup of DA in v6 table */
> + SEG6_LOCAL_ACTION_END_DT6 = 7,
> + /* decap and lookup of DA in v4 table */
> + SEG6_LOCAL_ACTION_END_DT4 = 8,
> + /* binding segment with insertion */
> + SEG6_LOCAL_ACTION_END_B6 = 9,
> + /* binding segment with encapsulation */
> + SEG6_LOCAL_ACTION_END_B6_ENCAP = 10,
> + /* binding segment with MPLS encap */
> + SEG6_LOCAL_ACTION_END_BM = 11,
> + /* lookup last seg in table */
> + SEG6_LOCAL_ACTION_END_S = 12,
> + /* forward to SR-unaware VNF with static proxy */
> + SEG6_LOCAL_ACTION_END_AS = 13,
> + /* forward to SR-unaware VNF with masquerading */
> + SEG6_LOCAL_ACTION_END_AM = 14,
> + /* custom BPF action */
> + SEG6_LOCAL_ACTION_END_BPF = 15,
> +
> + __SEG6_LOCAL_ACTION_MAX,
> +};
> +
> +#define SEG6_LOCAL_ACTION_MAX (__SEG6_LOCAL_ACTION_MAX - 1)
> +
> +enum {
> + SEG6_LOCAL_BPF_PROG_UNSPEC,
> + SEG6_LOCAL_BPF_PROG,
> + SEG6_LOCAL_BPF_PROG_NAME,
> + __SEG6_LOCAL_BPF_PROG_MAX,
> +};
> +
> +#define SEG6_LOCAL_BPF_PROG_MAX (__SEG6_LOCAL_BPF_PROG_MAX - 1)
> +
> +#endif
> --
> 2.16.1
>
Acked-by: Song Liu <songliubraving@fb.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox