* Re: [PATCH] net: Fix device name resolving crash in default_device_exit()
From: David Ahern @ 2018-06-21 15:28 UTC (permalink / raw)
To: Kirill Tkhai, netdev
Cc: davem, daniel, jakub.kicinski, ast, linux, john.fastabend, brouer
In-Reply-To: <84f7019b-c1cb-9851-2ece-aa2e16d8d297@virtuozzo.com>
On 6/21/18 4:03 AM, Kirill Tkhai wrote:
>> This patch does not remove the BUG, so does not really solve the
>> problem. ie., it is fairly trivial to write a script (32k dev%d named
>> devices in init_net) that triggers it again, so your commit subject and
>> commit log are not correct with the references to 'fixing the problem'.
>
> 1)I'm not agree with you and I don't think removing the BUG() is a good idea.
> This function is called from the place, where it must not fail. But it can
> fail, and the problem with name is not the only reason of this happens.
> We can't continue further pernet_operations in case of a problem happened
> in default_device_exit(), and we can't remove the BUG() before this function
> becomes of void type. But we are not going to make it of void type. So
> we can't remove the BUG().
You missed my point: that the function can still fail means you are not
"fixing" the problem, only delaying it.
>
> 2)In case of the script is trivial, can't you just post it here to show
> what type of devices you mean? Is there real problem or this is
> a theoretical thinking?
Current code:
# ip li sh dev eth2
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
mode DEFAULT group default qlen 1000
link/ether 02:e0:f9:46:64:80 brd ff:ff:ff:ff:ff:ff
# ip netns add fubar
# ip li set eth2 netns fubar
# ip li add eth2 type dummy
# ip li add dev4 type dummy
# ip netns del fubar
--> BUG
kernel:[78079.127748] default_device_exit: failed to move eth2 to
init_net: -17
With your patch:
# ip li sh dev eth2
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
mode DEFAULT group default qlen 1000
link/ether 02:e0:f9:46:64:80 brd ff:ff:ff:ff:ff:ff
# ip netns add fubar
# ip li set eth2 netns fubar
# ip li add eth2 type dummy
# for n in $(seq 0 $((32*1024))); do
echo "li add dev${n} type dummy"
done > ip.batch
# ip -batch ip.batch
# ip netns del fubar
--> BUG
kernel:[ 25.800024] default_device_exit: failed to move eth2 to
init_net: -17
>
> All virtual devices I see have rtnl_link_ops, so that they just destroyed
> in default_device_exit_batch(). According to physical devices, it's difficult
> to imagine a node with 32k physical devices, and if someone tried to deploy
> them it may meet problems not only in this place.
Nothing says it has to be a physical device. It is only checking for a name.
>
>> The change does provide more variability in naming and reduces the
>> likelihood of not being able to push a device back to init_net.
>
> No, it provides. With the patch one may move real device to a container,
> and allow to do with the device anything including changing of device
> index. Then, the destruction of the container does not resilt a kernel
> panic just because of two devices have the same index.
>
> Kirill
>
^ permalink raw reply
* Re: [RFC v2, net-next, PATCH 4/4] net/cpsw_switchdev: add switchdev mode of operation on cpsw driver
From: Arnd Bergmann @ 2018-06-21 15:31 UTC (permalink / raw)
To: Ilias Apalodimas
Cc: Ivan Vecera, Florian Fainelli, Andrew Lunn, Networking,
Grygorii Strashko, ivan.khoronzhuk, Sekhar Nori,
Jiří Pírko, Francois Ozog, yogeshs, spatton,
Jose.Abreu
In-Reply-To: <20180621124552.GA15208@apalos>
On Thu, Jun 21, 2018 at 2:45 PM, Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
> On Thu, Jun 21, 2018 at 02:19:55PM +0200, Ivan Vecera wrote:
> The driver is currently widely used and that's the reason we tried to avoid
> rewriting it. The current driver uses a DTS option to distinguish between two
> existing modes. This patch just adds a third one. So to my understanding we
> have the following options:
> 1. The driver already uses DTS to configure the hardware mode. Although this is
> techincally wrong, we can add a third mode on DTS called 'switchdev;', get rid
> of the .config option and keep the configuration method common (although not
> optimal).
> 2. Keep the .config option which overrides the 2 existing modes.
> 3. Introduce a devlink option. If this is applied for all 3 modes, it will break
> backwards compatibility, so it's not an option. If it's introduced for
> configuring 'switchdev' mode only, we fall into the same pitfall as option 2),
> we have something that overrides our current config, slightly better though
> since it's not a compile time option.
> 4. rewrite the driver
As I understand it, the switchdev support can also be added without
becoming incompatible with the existing behavior, this is how I would
suggest it gets added in a way that keeps the existing DT binding and
user view while adding switchdev:
* In non-"dual-emac" mode, show only one network device that is
configured as a transparent switch as today. Any users that today
add TI's third-party ioctl interface with a non-upstreamable patch
can keep using this mode and try to forward-port that patch.
* In "dual-emac" mode (as selected through DT), the hardware is
configured to be viewed as two separate network devices as before,
regardless of kernel configuration. Users can add the two device
to a bridge device as before, and enabling switchdev support in
the kernel configuration (based on your patch series) would change
nothing else than using hardware support in the background to
reconfigure the HW VLAN settings.
This does not require using devlink, adding a third mode, or changing
the DT binding or the user-visible behavior when switchdev is enabled,
but should get all the features you want.
> If it was a brand new driver, i'd definitely pick 4. Since it's a pre-existing
> driver though i can't rule out the rest of the options.
I think the suggestion was to have a new driver with a new binding
so that the DT could choose between the two drivers, one with
somewhat obscure behavior and the other with proper behavior.
However, from what I can tell, the only requirement to get a somewhat
reasonable behavior is that you enable "dual-emac" mode in DT
to get switchdev support. It would be trivial to add a new compatible
value that only allows that mode along with supporting switchdev,
but I don't think that's necessary here.
Writing a new driver might also be a good idea (depending on the
quality of the existing one, I haven't looked in detail), but again
I would see no reason for the new driver to be incompatible with
the existing binding, so a gradual cleanup seems like a better
approach.
Arnd
^ permalink raw reply
* I am waiting to hear from you soon
From: Mrs Raymond.Mabel @ 2018-06-21 15:34 UTC (permalink / raw)
From
The Desk Of Mrs Mabel Raymond
The International Scammers Crime Worldwide Compensation Financial Unit
Burkina Faso In West Africa..
Attention Beneficiary
My Name Is Mrs Mabel Raymond Staff Of international Scammers Crime
Worldwide Compensation Financial Unit .
I have discovered through our network E-mail system That your
E-mail Address Has been choosing For Compensation Due to Your
communication Regarding on Your Inheritance Claim Fund Which Was
trapped by Some Bank Officers and who Refused To Release Your Fund To
you.
There fore I would appreciate to inform you that there is hope for
you to recover Your Inheritance Fund And all what you have lost .
Your Inheritance Fund Has Deposited To One Of The Security
Financier Company For Security Reason Here In Burkina Faso. You are
Advised To Reply to enable Me Notify The Finance Company To
Proceed on Transferring Your Inheritance Fund ( $ 6.5M usd ) Six
Million Five Hundred Thousand American Dollars to your Bank Account
In Your country Or Any Place Of your choice This will be completed
within the next few days . Reply I have To Instruct You On What to
Do To Avoid Any Delay Receiving Your Fund Into Your Bank Account.
I am waiting to hear from you soon
Thank you .
Mrs Raymond Mabel
^ permalink raw reply
* [PATCH v2 4/5] ceph: use timespec64 for r_mtime
From: Arnd Bergmann @ 2018-06-21 15:46 UTC (permalink / raw)
To: Ilya Dryomov, Sage Weil, Yan, Zheng
Cc: y2038, Yan Zheng, Arnd Bergmann, Alex Elder, Jens Axboe,
David S. Miller, Chengguang Xu, ceph-devel, linux-block,
linux-kernel, netdev
The request mtime field is used all over ceph, and is currently
represented as a 'timespec' structure in Linux. This changes it to
timespec64 to allow times beyond 2038, modifying all users at the
same time.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
v2: undo an unneeded change pointed out by Yan Zheng.
Resending only this patch for now, let me know if you
would like to see the entire series reposted instead.
---
drivers/block/rbd.c | 2 +-
fs/ceph/addr.c | 12 ++++++------
fs/ceph/file.c | 8 ++++----
include/linux/ceph/osd_client.h | 6 +++---
net/ceph/osd_client.c | 8 ++++----
5 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index fa0729c1e776..356936333cd9 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1452,7 +1452,7 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request)
struct ceph_osd_request *osd_req = obj_request->osd_req;
osd_req->r_flags = CEPH_OSD_FLAG_WRITE;
- ktime_get_real_ts(&osd_req->r_mtime);
+ ktime_get_real_ts64(&osd_req->r_mtime);
osd_req->r_data_offset = obj_request->ex.oe_off;
}
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 292b3d72d725..d44d51e69e76 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -574,7 +574,7 @@ static u64 get_writepages_data_length(struct inode *inode,
*/
static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
{
- struct timespec ts;
+ struct timespec64 ts;
struct inode *inode;
struct ceph_inode_info *ci;
struct ceph_fs_client *fsc;
@@ -625,7 +625,7 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
set_page_writeback(page);
- ts = timespec64_to_timespec(inode->i_mtime);
+ ts = inode->i_mtime;
err = ceph_osdc_writepages(&fsc->client->osdc, ceph_vino(inode),
&ci->i_layout, snapc, page_off, len,
ceph_wbc.truncate_seq,
@@ -1134,7 +1134,7 @@ static int ceph_writepages_start(struct address_space *mapping,
pages = NULL;
}
- req->r_mtime = timespec64_to_timespec(inode->i_mtime);
+ req->r_mtime = inode->i_mtime;
rc = ceph_osdc_start_request(&fsc->client->osdc, req, true);
BUG_ON(rc);
req = NULL;
@@ -1734,7 +1734,7 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
goto out;
}
- req->r_mtime = timespec64_to_timespec(inode->i_mtime);
+ req->r_mtime = inode->i_mtime;
err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
if (!err)
err = ceph_osdc_wait_request(&fsc->client->osdc, req);
@@ -1776,7 +1776,7 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
goto out_put;
}
- req->r_mtime = timespec64_to_timespec(inode->i_mtime);
+ req->r_mtime = inode->i_mtime;
err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
if (!err)
err = ceph_osdc_wait_request(&fsc->client->osdc, req);
@@ -1937,7 +1937,7 @@ static int __ceph_pool_perm_get(struct ceph_inode_info *ci,
0, false, true);
err = ceph_osdc_start_request(&fsc->client->osdc, rd_req, false);
- wr_req->r_mtime = timespec64_to_timespec(ci->vfs_inode.i_mtime);
+ wr_req->r_mtime = ci->vfs_inode.i_mtime;
err2 = ceph_osdc_start_request(&fsc->client->osdc, wr_req, false);
if (!err)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index ad0bed99b1d5..2f3a30ca94bf 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -721,7 +721,7 @@ struct ceph_aio_request {
struct list_head osd_reqs;
unsigned num_reqs;
atomic_t pending_reqs;
- struct timespec mtime;
+ struct timespec64 mtime;
struct ceph_cap_flush *prealloc_cf;
};
@@ -923,7 +923,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
int num_pages = 0;
int flags;
int ret;
- struct timespec mtime = timespec64_to_timespec(current_time(inode));
+ struct timespec64 mtime = current_time(inode);
size_t count = iov_iter_count(iter);
loff_t pos = iocb->ki_pos;
bool write = iov_iter_rw(iter) == WRITE;
@@ -1131,7 +1131,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
int flags;
int ret;
bool check_caps = false;
- struct timespec mtime = timespec64_to_timespec(current_time(inode));
+ struct timespec64 mtime = current_time(inode);
size_t count = iov_iter_count(from);
if (ceph_snap(file_inode(file)) != CEPH_NOSNAP)
@@ -1663,7 +1663,7 @@ static int ceph_zero_partial_object(struct inode *inode,
goto out;
}
- req->r_mtime = timespec64_to_timespec(inode->i_mtime);
+ req->r_mtime = inode->i_mtime;
ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
if (!ret) {
ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 0d6ee04b4c41..2e6611c1e9a0 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -199,7 +199,7 @@ struct ceph_osd_request {
/* set by submitter */
u64 r_snapid; /* for reads, CEPH_NOSNAP o/w */
struct ceph_snap_context *r_snapc; /* for writes */
- struct timespec r_mtime; /* ditto */
+ struct timespec64 r_mtime; /* ditto */
u64 r_data_offset; /* ditto */
bool r_linger; /* don't resend on failure */
@@ -253,7 +253,7 @@ struct ceph_osd_linger_request {
struct ceph_osd_request_target t;
u32 map_dne_bound;
- struct timespec mtime;
+ struct timespec64 mtime;
struct kref kref;
struct mutex lock;
@@ -508,7 +508,7 @@ extern int ceph_osdc_writepages(struct ceph_osd_client *osdc,
struct ceph_snap_context *sc,
u64 off, u64 len,
u32 truncate_seq, u64 truncate_size,
- struct timespec *mtime,
+ struct timespec64 *mtime,
struct page **pages, int nr_pages);
/* watch/notify */
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index a00c74f1154e..a87a021ca9d0 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1978,7 +1978,7 @@ static void encode_request_partial(struct ceph_osd_request *req,
p += sizeof(struct ceph_blkin_trace_info);
ceph_encode_32(&p, 0); /* client_inc, always 0 */
- ceph_encode_timespec(p, &req->r_mtime);
+ ceph_encode_timespec64(p, &req->r_mtime);
p += sizeof(struct ceph_timespec);
encode_oloc(&p, end, &req->r_t.target_oloc);
@@ -4512,7 +4512,7 @@ ceph_osdc_watch(struct ceph_osd_client *osdc,
ceph_oid_copy(&lreq->t.base_oid, oid);
ceph_oloc_copy(&lreq->t.base_oloc, oloc);
lreq->t.flags = CEPH_OSD_FLAG_WRITE;
- ktime_get_real_ts(&lreq->mtime);
+ ktime_get_real_ts64(&lreq->mtime);
lreq->reg_req = alloc_linger_request(lreq);
if (!lreq->reg_req) {
@@ -4570,7 +4570,7 @@ int ceph_osdc_unwatch(struct ceph_osd_client *osdc,
ceph_oid_copy(&req->r_base_oid, &lreq->t.base_oid);
ceph_oloc_copy(&req->r_base_oloc, &lreq->t.base_oloc);
req->r_flags = CEPH_OSD_FLAG_WRITE;
- ktime_get_real_ts(&req->r_mtime);
+ ktime_get_real_ts64(&req->r_mtime);
osd_req_op_watch_init(req, 0, lreq->linger_id,
CEPH_OSD_WATCH_OP_UNWATCH);
@@ -5136,7 +5136,7 @@ int ceph_osdc_writepages(struct ceph_osd_client *osdc, struct ceph_vino vino,
struct ceph_snap_context *snapc,
u64 off, u64 len,
u32 truncate_seq, u64 truncate_size,
- struct timespec *mtime,
+ struct timespec64 *mtime,
struct page **pages, int num_pages)
{
struct ceph_osd_request *req;
--
2.9.0
^ permalink raw reply related
* Re: [PATCH v4 net-next] net:sched: add action inheritdsfield to skbedit
From: Fu, Qiaobin @ 2018-06-21 15:46 UTC (permalink / raw)
To: davem@davemloft.net
Cc: Marcelo Ricardo Leitner, Davide Caratti, Michel Machado,
netdev@vger.kernel.org, jhs@mojatatu.com,
xiyou.wangcong@gmail.com
In-Reply-To: <20180620184027.GA3446@localhost.localdomain>
The new action inheritdsfield copies the field DS of
IPv4 and IPv6 packets into skb->priority. This enables
later classification of packets based on the DS field.
v5:
*Update the drop counter for TC_ACT_SHOT.
v4:
*Not allow setting flags other than the expected ones.
*Allow dumping the pure flags.
v3:
*Use optional flags, so that it won't break old versions of tc.
*Allow users to set both SKBEDIT_F_PRIORITY and SKBEDIT_F_INHERITDSFIELD flags.
v2:
*Fix the style issue
*Move the code from skbmod to skbedit
Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
Note that the motivation for this patch is found in the following discussion:
https://www.spinics.net/lists/netdev/msg501061.html
---
diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
index fbcfe27a4e6c..6de6071ebed6 100644
--- a/include/uapi/linux/tc_act/tc_skbedit.h
+++ b/include/uapi/linux/tc_act/tc_skbedit.h
@@ -30,6 +30,7 @@
#define SKBEDIT_F_MARK 0x4
#define SKBEDIT_F_PTYPE 0x8
#define SKBEDIT_F_MASK 0x10
+#define SKBEDIT_F_INHERITDSFIELD 0x20
struct tc_skbedit {
tc_gen;
@@ -45,6 +46,7 @@ enum {
TCA_SKBEDIT_PAD,
TCA_SKBEDIT_PTYPE,
TCA_SKBEDIT_MASK,
+ TCA_SKBEDIT_FLAGS,
__TCA_SKBEDIT_MAX
};
#define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 6138d1d71900..dfaf5d8028dd 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -23,6 +23,9 @@
#include <linux/rtnetlink.h>
#include <net/netlink.h>
#include <net/pkt_sched.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/dsfield.h>
#include <linux/tc_act/tc_skbedit.h>
#include <net/tc_act/tc_skbedit.h>
@@ -41,6 +44,25 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
if (d->flags & SKBEDIT_F_PRIORITY)
skb->priority = d->priority;
+ if (d->flags & SKBEDIT_F_INHERITDSFIELD) {
+ int wlen = skb_network_offset(skb);
+
+ switch (tc_skb_protocol(skb)) {
+ case htons(ETH_P_IP):
+ wlen += sizeof(struct iphdr);
+ if (!pskb_may_pull(skb, wlen))
+ goto err;
+ skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
+ break;
+
+ case htons(ETH_P_IPV6):
+ wlen += sizeof(struct ipv6hdr);
+ if (!pskb_may_pull(skb, wlen))
+ goto err;
+ skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
+ break;
+ }
+ }
if (d->flags & SKBEDIT_F_QUEUE_MAPPING &&
skb->dev->real_num_tx_queues > d->queue_mapping)
skb_set_queue_mapping(skb, d->queue_mapping);
@@ -53,6 +75,11 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
spin_unlock(&d->tcf_lock);
return d->tcf_action;
+
+err:
+ d->tcf_qstats.drops++;
+ spin_unlock(&d->tcf_lock);
+ return TC_ACT_SHOT;
}
static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
@@ -62,6 +89,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
[TCA_SKBEDIT_MARK] = { .len = sizeof(u32) },
[TCA_SKBEDIT_PTYPE] = { .len = sizeof(u16) },
[TCA_SKBEDIT_MASK] = { .len = sizeof(u32) },
+ [TCA_SKBEDIT_FLAGS] = { .len = sizeof(u64) },
};
static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
@@ -114,6 +142,13 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
mask = nla_data(tb[TCA_SKBEDIT_MASK]);
}
+ if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
+ u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
+
+ if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
+ flags |= SKBEDIT_F_INHERITDSFIELD;
+ }
+
parm = nla_data(tb[TCA_SKBEDIT_PARMS]);
exists = tcf_idr_check(tn, parm->index, a, bind);
@@ -178,6 +213,7 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
.action = d->tcf_action,
};
struct tcf_t t;
+ u64 pure_flags = 0;
if (nla_put(skb, TCA_SKBEDIT_PARMS, sizeof(opt), &opt))
goto nla_put_failure;
@@ -196,6 +232,11 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
if ((d->flags & SKBEDIT_F_MASK) &&
nla_put_u32(skb, TCA_SKBEDIT_MASK, d->mask))
goto nla_put_failure;
+ if (d->flags & SKBEDIT_F_INHERITDSFIELD)
+ pure_flags |= SKBEDIT_F_INHERITDSFIELD;
+ if (pure_flags != 0 &&
+ nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
+ goto nla_put_failure;
tcf_tm_dump(&t, &d->tcf_tm);
if (nla_put_64bit(skb, TCA_SKBEDIT_TM, sizeof(t), &t, TCA_SKBEDIT_PAD))
^ permalink raw reply related
* [PATCH v5 net-next] net:sched: add action inheritdsfield to skbedit
From: Fu, Qiaobin @ 2018-06-21 15:50 UTC (permalink / raw)
To: davem@davemloft.net
Cc: Marcelo Ricardo Leitner, Davide Caratti, Michel Machado,
netdev@vger.kernel.org, jhs@mojatatu.com,
xiyou.wangcong@gmail.com
In-Reply-To: <8BAB2602-EBB4-4A8D-BBF4-5399CB486175@bu.edu>
The new action inheritdsfield copies the field DS of
IPv4 and IPv6 packets into skb->priority. This enables
later classification of packets based on the DS field.
v5:
*Update the drop counter for TC_ACT_SHOT
v4:
*Not allow setting flags other than the expected ones.
*Allow dumping the pure flags.
v3:
*Use optional flags, so that it won't break old versions of tc.
*Allow users to set both SKBEDIT_F_PRIORITY and SKBEDIT_F_INHERITDSFIELD flags.
v2:
*Fix the style issue
*Move the code from skbmod to skbedit
Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
Note that the motivation for this patch is found in the following discussion:
https://www.spinics.net/lists/netdev/msg501061.html
---
diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
index fbcfe27a4e6c..6de6071ebed6 100644
--- a/include/uapi/linux/tc_act/tc_skbedit.h
+++ b/include/uapi/linux/tc_act/tc_skbedit.h
@@ -30,6 +30,7 @@
#define SKBEDIT_F_MARK 0x4
#define SKBEDIT_F_PTYPE 0x8
#define SKBEDIT_F_MASK 0x10
+#define SKBEDIT_F_INHERITDSFIELD 0x20
struct tc_skbedit {
tc_gen;
@@ -45,6 +46,7 @@ enum {
TCA_SKBEDIT_PAD,
TCA_SKBEDIT_PTYPE,
TCA_SKBEDIT_MASK,
+ TCA_SKBEDIT_FLAGS,
__TCA_SKBEDIT_MAX
};
#define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 6138d1d71900..dfaf5d8028dd 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -23,6 +23,9 @@
#include <linux/rtnetlink.h>
#include <net/netlink.h>
#include <net/pkt_sched.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/dsfield.h>
#include <linux/tc_act/tc_skbedit.h>
#include <net/tc_act/tc_skbedit.h>
@@ -41,6 +44,25 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
if (d->flags & SKBEDIT_F_PRIORITY)
skb->priority = d->priority;
+ if (d->flags & SKBEDIT_F_INHERITDSFIELD) {
+ int wlen = skb_network_offset(skb);
+
+ switch (tc_skb_protocol(skb)) {
+ case htons(ETH_P_IP):
+ wlen += sizeof(struct iphdr);
+ if (!pskb_may_pull(skb, wlen))
+ goto err;
+ skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
+ break;
+
+ case htons(ETH_P_IPV6):
+ wlen += sizeof(struct ipv6hdr);
+ if (!pskb_may_pull(skb, wlen))
+ goto err;
+ skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
+ break;
+ }
+ }
if (d->flags & SKBEDIT_F_QUEUE_MAPPING &&
skb->dev->real_num_tx_queues > d->queue_mapping)
skb_set_queue_mapping(skb, d->queue_mapping);
@@ -53,6 +75,11 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
spin_unlock(&d->tcf_lock);
return d->tcf_action;
+
+err:
+ d->tcf_qstats.drops++;
+ spin_unlock(&d->tcf_lock);
+ return TC_ACT_SHOT;
}
static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
@@ -62,6 +89,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
[TCA_SKBEDIT_MARK] = { .len = sizeof(u32) },
[TCA_SKBEDIT_PTYPE] = { .len = sizeof(u16) },
[TCA_SKBEDIT_MASK] = { .len = sizeof(u32) },
+ [TCA_SKBEDIT_FLAGS] = { .len = sizeof(u64) },
};
static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
@@ -114,6 +142,13 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
mask = nla_data(tb[TCA_SKBEDIT_MASK]);
}
+ if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
+ u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
+
+ if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
+ flags |= SKBEDIT_F_INHERITDSFIELD;
+ }
+
parm = nla_data(tb[TCA_SKBEDIT_PARMS]);
exists = tcf_idr_check(tn, parm->index, a, bind);
@@ -178,6 +213,7 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
.action = d->tcf_action,
};
struct tcf_t t;
+ u64 pure_flags = 0;
if (nla_put(skb, TCA_SKBEDIT_PARMS, sizeof(opt), &opt))
goto nla_put_failure;
@@ -196,6 +232,11 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
if ((d->flags & SKBEDIT_F_MASK) &&
nla_put_u32(skb, TCA_SKBEDIT_MASK, d->mask))
goto nla_put_failure;
+ if (d->flags & SKBEDIT_F_INHERITDSFIELD)
+ pure_flags |= SKBEDIT_F_INHERITDSFIELD;
+ if (pure_flags != 0 &&
+ nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
+ goto nla_put_failure;
tcf_tm_dump(&t, &d->tcf_tm);
if (nla_put_64bit(skb, TCA_SKBEDIT_TM, sizeof(t), &t, TCA_SKBEDIT_PAD))
^ permalink raw reply related
* Re: [PATCH v1 1/1] VSOCK: fix loopback on big-endian systems
From: Stefan Hajnoczi @ 2018-06-21 16:07 UTC (permalink / raw)
To: Claudio Imbrenda
Cc: davem, jhansen, cavery, borntraeger, fiuczy, linux-kernel, netdev
In-Reply-To: <1529502711-8028-1-git-send-email-imbrenda@linux.vnet.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 681 bytes --]
On Wed, Jun 20, 2018 at 03:51:51PM +0200, Claudio Imbrenda wrote:
> The dst_cid and src_cid are 64 bits, therefore 64 bit accessors should be
> used, and in fact in virtio_transport_common.c only 64 bit accessors are
> used. Using 32 bit accessors for 64 bit values breaks big endian systems.
>
> This patch fixes a wrong use of le32_to_cpu in virtio_transport_send_pkt.
>
> Fixes: b9116823189e85ccf384 ("VSOCK: add loopback to virtio_transport")
>
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
> ---
> net/vmw_vsock/virtio_transport.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply
* Re: [PATCH v5 net-next] net:sched: add action inheritdsfield to skbedit
From: Davide Caratti @ 2018-06-21 16:13 UTC (permalink / raw)
To: Fu, Qiaobin, davem@davemloft.net
Cc: Marcelo Ricardo Leitner, Michel Machado, netdev@vger.kernel.org,
jhs@mojatatu.com, xiyou.wangcong@gmail.com
In-Reply-To: <B84B92F9-B872-4430-B7E2-FBF23E543632@bu.edu>
On Thu, 2018-06-21 at 15:50 +0000, Fu, Qiaobin wrote:
> The new action inheritdsfield copies the field DS of
> IPv4 and IPv6 packets into skb->priority. This enables
> later classification of packets based on the DS field.
>
> v5:
> *Update the drop counter for TC_ACT_SHOT
Acked-by: Davide Caratti <dcaratti@redhat.com>
^ permalink raw reply
* Re: [PATCH net 1/1] net/smc: coordinate wait queues for nonblocking connect
From: kbuild test robot @ 2018-06-21 16:26 UTC (permalink / raw)
To: Ursula Braun
Cc: kbuild-all, davem, netdev, linux-s390, schwidefsky,
heiko.carstens, raspl, ubraun, xiyou.wangcong, hch
In-Reply-To: <20180620080737.50323-1-ubraun@linux.ibm.com>
Hi Ursula,
I love your patch! Perhaps something to improve:
[auto build test WARNING on net/master]
url: https://github.com/0day-ci/linux/commits/Ursula-Braun/net-smc-coordinate-wait-queues-for-nonblocking-connect/20180620-180901
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__
sparse warnings: (new ones prefixed by >>)
>> net/smc/af_smc.c:1301:49: sparse: incorrect type in assignment (different address spaces) @@ expected struct socket_wq [noderef] <asn:4>*sk_wq @@ got [noderef] <asn:4>*sk_wq @@
net/smc/af_smc.c:1301:49: expected struct socket_wq [noderef] <asn:4>*sk_wq
net/smc/af_smc.c:1301:49: got struct socket_wq *smcwq
net/smc/smc_cdc.h:143:24: sparse: expression using sizeof(void)
net/smc/smc_cdc.h:146:16: sparse: expression using sizeof(void)
net/smc/smc_cdc.h:143:24: sparse: expression using sizeof(void)
net/smc/smc_cdc.h:146:16: sparse: expression using sizeof(void)
>> net/smc/af_smc.c:1667:20: sparse: incorrect type in assignment (different address spaces) @@ expected struct socket_wq *smcwq @@ got struct socket_wq struct socket_wq *smcwq @@
net/smc/af_smc.c:1667:20: expected struct socket_wq *smcwq
net/smc/af_smc.c:1667:20: got struct socket_wq [noderef] <asn:4>*sk_wq
net/smc/af_smc.c:1668:29: sparse: expression using sizeof(void)
net/smc/af_smc.c:1669:29: sparse: expression using sizeof(void)
vim +1301 net/smc/af_smc.c
1277
1278 static __poll_t smc_poll_mask(struct socket *sock, __poll_t events)
1279 {
1280 struct sock *sk = sock->sk;
1281 __poll_t mask = 0;
1282 struct smc_sock *smc;
1283 int rc;
1284
1285 if (!sk)
1286 return EPOLLNVAL;
1287
1288 smc = smc_sk(sock->sk);
1289 sock_hold(sk);
1290 if ((sk->sk_state == SMC_INIT) || smc->use_fallback) {
1291 /* delegate to CLC child sock */
1292 mask = smc->clcsock->ops->poll_mask(smc->clcsock, events);
1293 sk->sk_err = smc->clcsock->sk->sk_err;
1294 if (sk->sk_err) {
1295 mask |= EPOLLERR;
1296 } else {
1297 /* if non-blocking connect finished ... */
1298 if (sk->sk_state == SMC_INIT &&
1299 mask & EPOLLOUT &&
1300 smc->clcsock->sk->sk_state != TCP_CLOSE) {
> 1301 sock->sk->sk_wq = smc->smcwq;
1302 lock_sock(sk);
1303 rc = __smc_connect(smc);
1304 release_sock(sk);
1305 if (rc < 0)
1306 mask |= EPOLLERR;
1307 /* success cases including fallback */
1308 mask |= EPOLLOUT | EPOLLWRNORM;
1309 }
1310 }
1311 } else {
1312 if (sk->sk_err)
1313 mask |= EPOLLERR;
1314 if ((sk->sk_shutdown == SHUTDOWN_MASK) ||
1315 (sk->sk_state == SMC_CLOSED))
1316 mask |= EPOLLHUP;
1317 if (sk->sk_state == SMC_LISTEN) {
1318 /* woken up by sk_data_ready in smc_listen_work() */
1319 mask = smc_accept_poll(sk);
1320 } else {
1321 if (atomic_read(&smc->conn.sndbuf_space) ||
1322 sk->sk_shutdown & SEND_SHUTDOWN) {
1323 mask |= EPOLLOUT | EPOLLWRNORM;
1324 } else {
1325 sk_set_bit(SOCKWQ_ASYNC_NOSPACE, sk);
1326 set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
1327 }
1328 if (atomic_read(&smc->conn.bytes_to_rcv))
1329 mask |= EPOLLIN | EPOLLRDNORM;
1330 if (sk->sk_shutdown & RCV_SHUTDOWN)
1331 mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
1332 if (sk->sk_state == SMC_APPCLOSEWAIT1)
1333 mask |= EPOLLIN;
1334 }
1335 if (smc->conn.urg_state == SMC_URG_VALID)
1336 mask |= EPOLLPRI;
1337
1338 }
1339 sock_put(sk);
1340
1341 return mask;
1342 }
1343
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply
* hi
From: Dr.Abdul Fha @ 2018-06-21 16:30 UTC (permalink / raw)
Dear
I am.Dr.Abdul Fha,Manager Auditing and Accountancy Department,Ecobank,
Burkina Faso.I have a business proposal for you in the tune of 3.5
million usd.If you know you are capable of involving and partaking in
this transaction this will be disbursed or shared between the both of
us. email me so that i will give you all the informetion of it (dr.
abdulfha44@gmail.com)
Regards,
Dr.Abdul fha,
^ permalink raw reply
* Re: [GIT] Networking
From: Ingo Molnar @ 2018-06-21 16:33 UTC (permalink / raw)
To: Matteo Croce
Cc: David S . Miller, alexei.starovoitov, sfr, torvalds, akpm, netdev,
linux-kernel, tglx
In-Reply-To: <CAGnkfhxGAYZNhJp7eyg+_j3LY31w7muFqerhQp7jGqQ02iFxkg@mail.gmail.com>
* Matteo Croce <mcroce@redhat.com> wrote:
> Hi Ingo,
>
> are you compiling a 32 bit kernel on an x86_64 host?
Yes.
> then I tried to compile an i386 kernel on an x86_64 host and I get the
> same error:
>
> $ make -j8 ARCH=i386
> ...
> LD vmlinux.o
> ld: i386:x86-64 architecture of input file
> `net/bpfilter/bpfilter_umh.o' is incompatible with i386 output
Correct.
> Any idea how to fix it without building it twice, for host and target?
No idea, sorry ...
Thanks,
Ingo
^ permalink raw reply
* Re: [PATCH net-next 0/2] fixes for ipsec selftests
From: Anders Roxell @ 2018-06-21 16:56 UTC (permalink / raw)
To: shannon.nelson; +Cc: Networking, David Miller
In-Reply-To: <6134e116-13c5-ee9d-e539-35679efcd665@oracle.com>
On Thu, 21 Jun 2018 at 02:32, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>
> On 6/20/2018 4:18 PM, Anders Roxell wrote:
> > On Thu, 21 Jun 2018 at 00:26, Shannon Nelson <shannon.nelson@oracle.com> wrote:
> >>
> >> On 6/20/2018 12:09 PM, Anders Roxell wrote:
> >>> On Wed, 20 Jun 2018 at 07:42, Shannon Nelson <shannon.nelson@oracle.com> wrote:
> >>>>
> >>>> A couple of bad behaviors in the ipsec selftest were pointed out
> >>>> by Anders Roxell <anders.roxell@linaro.org> and are addressed here.
> >>>>
> >>>> Shannon Nelson (2):
> >>>> selftests: rtnetlink: hide complaint from terminated monitor
> >>>> selftests: rtnetlink: use a local IP address for IPsec tests
> >>>>
> >>>> tools/testing/selftests/net/rtnetlink.sh | 11 +++++++----
> >>>> 1 file changed, 7 insertions(+), 4 deletions(-)
> >>>>
> >>>> --
> >>>> 2.7.4
> >>>>
> >>>
> >>> Hi Shannon,
> >>>
> >>> With this patches applied and my config patch.
> >>>
> >>> I still get this error when I run the ipsec test:
> >>>
> >>> FAIL: can't add fou port 7777, skipping test
> >>> RTNETLINK answers: Operation not supported
> >>> FAIL: can't add macsec interface, skipping test
> >>> RTNETLINK answers: Protocol not supported
> >>> RTNETLINK answers: No such process
> >>> RTNETLINK answers: No such process
> >>> FAIL: ipsec
> >>
> >> One of the odd things I noticed about this script is that there really
> >> aren't any diagnosis messages, just PASS or FAIL. I followed this
> >> custom when I added the ipsec tests, but I think this is something that
> >> should change so we can get some idea of what breaks.
> >>
> >> I'm curious about the "RTNETLINK answers" messages and where they might
> >> be coming from, especially "RTNETLINK answers: Protocol not supported".
> >
> > I added: "set -x" in the beginning of the rtnetlink.sh script.
> > + ip x s add proto esp src 10.66.17.140 dst 10.66.17.141 spi 0x07 mode
> > transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))'
> > 0x3132333435
> > 363738393031323334353664636261 128 sel src 10.66.17.140/24 dst 10.66.17.141/24
> > RTNETLINK answers: Protocol not supported
>
> Okay, so ip didn't like this command...
>
> >> What are the XFRM and AES settings in your kernel config - what is the
> >> output from
> >> egrep -i "xfrm|_aes" .config
> >
> > CONFIG_XFRM=y
> > CONFIG_XFRM_ALGO=y
> > CONFIG_XFRM_USER=y
> > CONFIG_INET_XFRM_MODE_TUNNEL=y
> > CONFIG_INET6_XFRM_MODE_TRANSPORT=y
> > CONFIG_INET6_XFRM_MODE_TUNNEL=y
> > CONFIG_INET6_XFRM_MODE_BEET=y
> > CONFIG_CRYPTO_AES=y
>
> And this is probably why - there seem to be a few config variables
> missing, including CONFIG_INET_XFRM_MODE_TRANSPORT, which might be why
> the ip command fails above.
>
> Here's what I have in my config:
> CONFIG_XFRM=y
> CONFIG_XFRM_OFFLOAD=y
> CONFIG_XFRM_ALGO=m
> CONFIG_XFRM_USER=m
> # CONFIG_XFRM_SUB_POLICY is not set
> # CONFIG_XFRM_MIGRATE is not set
> CONFIG_XFRM_STATISTICS=y
> CONFIG_XFRM_IPCOMP=m
> CONFIG_INET_XFRM_TUNNEL=m
> CONFIG_INET_XFRM_MODE_TRANSPORT=m
> CONFIG_INET_XFRM_MODE_TUNNEL=m
> CONFIG_INET_XFRM_MODE_BEET=m
> CONFIG_INET6_XFRM_TUNNEL=m
> CONFIG_INET6_XFRM_MODE_TRANSPORT=m
> CONFIG_INET6_XFRM_MODE_TUNNEL=m
> CONFIG_INET6_XFRM_MODE_BEET=m
> CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
> CONFIG_SECURITY_NETWORK_XFRM=y
> CONFIG_CRYPTO_AES=y
> # CONFIG_CRYPTO_AES_TI is not set
> CONFIG_CRYPTO_AES_X86_64=m
> CONFIG_CRYPTO_AES_NI_INTEL=m
> CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
> CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
> CONFIG_CRYPTO_DEV_PADLOCK_AES=m
>
> Can I talk you into adding CONFIG_INET_XFRM_MODE_TRANSPORT to your
> config
Yes you can.
> and trying again?
same issue with CONFIG_INET_XFRM_MODE_TRANSPORT=y
Cheers,
Anders
^ permalink raw reply
* [PATCH] selftests: bpf: notification about privilege required to run test_kmod.sh testing script
From: Jeffrin Jose T @ 2018-06-21 17:00 UTC (permalink / raw)
To: ast, daniel, shuah; +Cc: netdev, linux-kernel, linux-kselftest, Jeffrin Jose T
The test_kmod.sh script require root privilege for the successful
execution of the test.
This patch is to notify the user about the privilege the script
demands for the successful execution of the test.
Signed-off-by: Jeffrin Jose T (Rajagiri SET) <ahiliation@gmail.com>
---
tools/testing/selftests/bpf/test_kmod.sh | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_kmod.sh b/tools/testing/selftests/bpf/test_kmod.sh
index 35669ccd4d23..378ccc512ad3 100755
--- a/tools/testing/selftests/bpf/test_kmod.sh
+++ b/tools/testing/selftests/bpf/test_kmod.sh
@@ -1,6 +1,15 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+msg="skip all tests:"
+if [ "$(id -u)" != "0" ]; then
+ echo $msg please run this as root >&2
+ exit $ksft_skip
+fi
+
SRC_TREE=../../../../
test_run()
--
2.17.0
^ permalink raw reply related
* [PATCH bpf] tools/bpf: fix test_sockmap failure
From: Yonghong Song @ 2018-06-21 17:02 UTC (permalink / raw)
To: ast, daniel, netdev; +Cc: kernel-team
On one of our production test machine, when running
bpf selftest test_sockmap, I got the following error:
# sudo ./test_sockmap
libbpf: failed to create map (name: 'sock_map'): Operation not permitted
libbpf: failed to load object 'test_sockmap_kern.o'
libbpf: Can't get the 0th fd from program sk_skb1: only -1 instances
......
load_bpf_file: (-1) Operation not permitted
ERROR: (-1) load bpf failed
The error is due to not-big-enough rlimit
struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
The test already includes "bpf_rlimit.h", which sets current
and max rlimit to RLIM_INFINITY. Let us just use it.
Signed-off-by: Yonghong Song <yhs@fb.com>
---
tools/testing/selftests/bpf/test_sockmap.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 05c8cb7..9e78df2 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1413,18 +1413,12 @@ static int test_suite(void)
int main(int argc, char **argv)
{
- struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
int iov_count = 1, length = 1024, rate = 1;
struct sockmap_options options = {0};
int opt, longindex, err, cg_fd = 0;
char *bpf_file = BPF_SOCKMAP_FILENAME;
int test = PING_PONG;
- if (setrlimit(RLIMIT_MEMLOCK, &r)) {
- perror("setrlimit(RLIMIT_MEMLOCK)");
- return 1;
- }
-
if (argc < 2)
return test_suite();
--
2.9.5
^ permalink raw reply related
* Re: [PATCH v2 bpf-net] bpf: Change bpf_fib_lookup to return lookup status
From: Martin KaFai Lau @ 2018-06-21 17:09 UTC (permalink / raw)
To: dsahern; +Cc: netdev, borkmann, ast, davem, David Ahern
In-Reply-To: <20180621030011.7441-1-dsahern@kernel.org>
On Wed, Jun 20, 2018 at 08:00:11PM -0700, dsahern@kernel.org wrote:
> From: David Ahern <dsahern@gmail.com>
>
> For ACLs implemented using either FIB rules or FIB entries, the BPF
> program needs the FIB lookup status to be able to drop the packet.
> Since the bpf_fib_lookup API has not reached a released kernel yet,
> change the return code to contain an encoding of the FIB lookup
> result and return the nexthop device index in the params struct.
>
> In addition, inform the BPF program of any post FIB lookup reason as
> to why the packet needs to go up the stack.
>
> The fib result for unicast routes must have an egress device, so remove
> the check that it is non-NULL.
Acked-by: Martin KaFai Lau <kafai@fb.com>
>
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
> v2
> - drop BPF_FIB_LKUP_RET_NO_NHDEV; check in dev in fib result not needed
> - enhance documentation of BPF_FIB_LKUP_RET_ codes
>
> include/uapi/linux/bpf.h | 28 ++++++++++++++----
> net/core/filter.c | 72 ++++++++++++++++++++++++++++++----------------
> samples/bpf/xdp_fwd_kern.c | 8 +++---
> 3 files changed, 74 insertions(+), 34 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 59b19b6a40d7..b7db3261c62d 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1857,7 +1857,8 @@ union bpf_attr {
> * is resolved), the nexthop address is returned in ipv4_dst
> * or ipv6_dst based on family, smac is set to mac address of
> * egress device, dmac is set to nexthop mac address, rt_metric
> - * is set to metric from route (IPv4/IPv6 only).
> + * is set to metric from route (IPv4/IPv6 only), and ifindex
> + * is set to the device index of the nexthop from the FIB lookup.
> *
> * *plen* argument is the size of the passed in struct.
> * *flags* argument can be a combination of one or more of the
> @@ -1873,9 +1874,10 @@ union bpf_attr {
> * *ctx* is either **struct xdp_md** for XDP programs or
> * **struct sk_buff** tc cls_act programs.
> * Return
> - * Egress device index on success, 0 if packet needs to continue
> - * up the stack for further processing or a negative error in case
> - * of failure.
> + * * < 0 if any input argument is invalid
> + * * 0 on success (packet is forwarded, nexthop neighbor exists)
> + * * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
> + * * packet is not forwarded or needs assist from full stack
> *
> * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map *map, void *key, u64 flags)
> * Description
> @@ -2612,6 +2614,18 @@ struct bpf_raw_tracepoint_args {
> #define BPF_FIB_LOOKUP_DIRECT BIT(0)
> #define BPF_FIB_LOOKUP_OUTPUT BIT(1)
>
> +enum {
> + BPF_FIB_LKUP_RET_SUCCESS, /* lookup successful */
> + BPF_FIB_LKUP_RET_BLACKHOLE, /* dest is blackholed; can be dropped */
> + BPF_FIB_LKUP_RET_UNREACHABLE, /* dest is unreachable; can be dropped */
> + BPF_FIB_LKUP_RET_PROHIBIT, /* dest not allowed; can be dropped */
> + BPF_FIB_LKUP_RET_NOT_FWDED, /* packet is not forwarded */
> + BPF_FIB_LKUP_RET_FWD_DISABLED, /* fwding is not enabled on ingress */
> + BPF_FIB_LKUP_RET_UNSUPP_LWT, /* fwd requires encapsulation */
> + BPF_FIB_LKUP_RET_NO_NEIGH, /* no neighbor entry for nh */
> + BPF_FIB_LKUP_RET_FRAG_NEEDED, /* fragmentation required to fwd */
> +};
> +
> struct bpf_fib_lookup {
> /* input: network family for lookup (AF_INET, AF_INET6)
> * output: network family of egress nexthop
> @@ -2625,7 +2639,11 @@ struct bpf_fib_lookup {
>
> /* total length of packet from network header - used for MTU check */
> __u16 tot_len;
> - __u32 ifindex; /* L3 device index for lookup */
> +
> + /* input: L3 device index for lookup
> + * output: device index from FIB lookup
> + */
> + __u32 ifindex;
>
> union {
> /* inputs to lookup */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index e7f12e9f598c..f8dd8aa89de4 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4073,8 +4073,9 @@ static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params,
> memcpy(params->smac, dev->dev_addr, ETH_ALEN);
> params->h_vlan_TCI = 0;
> params->h_vlan_proto = 0;
> + params->ifindex = dev->ifindex;
>
> - return dev->ifindex;
> + return 0;
> }
> #endif
>
> @@ -4098,7 +4099,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
> /* verify forwarding is enabled on this interface */
> in_dev = __in_dev_get_rcu(dev);
> if (unlikely(!in_dev || !IN_DEV_FORWARD(in_dev)))
> - return 0;
> + return BPF_FIB_LKUP_RET_FWD_DISABLED;
>
> if (flags & BPF_FIB_LOOKUP_OUTPUT) {
> fl4.flowi4_iif = 1;
> @@ -4123,7 +4124,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>
> tb = fib_get_table(net, tbid);
> if (unlikely(!tb))
> - return 0;
> + return BPF_FIB_LKUP_RET_NOT_FWDED;
>
> err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
> } else {
> @@ -4135,8 +4136,20 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
> err = fib_lookup(net, &fl4, &res, FIB_LOOKUP_NOREF);
> }
>
> - if (err || res.type != RTN_UNICAST)
> - return 0;
> + if (err) {
> + /* map fib lookup errors to RTN_ type */
> + if (err == -EINVAL)
> + return BPF_FIB_LKUP_RET_BLACKHOLE;
> + if (err == -EHOSTUNREACH)
> + return BPF_FIB_LKUP_RET_UNREACHABLE;
> + if (err == -EACCES)
> + return BPF_FIB_LKUP_RET_PROHIBIT;
> +
> + return BPF_FIB_LKUP_RET_NOT_FWDED;
> + }
> +
> + if (res.type != RTN_UNICAST)
> + return BPF_FIB_LKUP_RET_NOT_FWDED;
>
> if (res.fi->fib_nhs > 1)
> fib_select_path(net, &res, &fl4, NULL);
> @@ -4144,19 +4157,16 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
> if (check_mtu) {
> mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
> if (params->tot_len > mtu)
> - return 0;
> + return BPF_FIB_LKUP_RET_FRAG_NEEDED;
> }
>
> nh = &res.fi->fib_nh[res.nh_sel];
>
> /* do not handle lwt encaps right now */
> if (nh->nh_lwtstate)
> - return 0;
> + return BPF_FIB_LKUP_RET_UNSUPP_LWT;
>
> dev = nh->nh_dev;
> - if (unlikely(!dev))
> - return 0;
> -
> if (nh->nh_gw)
> params->ipv4_dst = nh->nh_gw;
>
> @@ -4166,10 +4176,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
> * rcu_read_lock_bh is not needed here
> */
> neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)params->ipv4_dst);
> - if (neigh)
> - return bpf_fib_set_fwd_params(params, neigh, dev);
> + if (!neigh)
> + return BPF_FIB_LKUP_RET_NO_NEIGH;
>
> - return 0;
> + return bpf_fib_set_fwd_params(params, neigh, dev);
> }
> #endif
>
> @@ -4190,7 +4200,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>
> /* link local addresses are never forwarded */
> if (rt6_need_strict(dst) || rt6_need_strict(src))
> - return 0;
> + return BPF_FIB_LKUP_RET_NOT_FWDED;
>
> dev = dev_get_by_index_rcu(net, params->ifindex);
> if (unlikely(!dev))
> @@ -4198,7 +4208,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>
> idev = __in6_dev_get_safely(dev);
> if (unlikely(!idev || !net->ipv6.devconf_all->forwarding))
> - return 0;
> + return BPF_FIB_LKUP_RET_FWD_DISABLED;
>
> if (flags & BPF_FIB_LOOKUP_OUTPUT) {
> fl6.flowi6_iif = 1;
> @@ -4225,7 +4235,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>
> tb = ipv6_stub->fib6_get_table(net, tbid);
> if (unlikely(!tb))
> - return 0;
> + return BPF_FIB_LKUP_RET_NOT_FWDED;
>
> f6i = ipv6_stub->fib6_table_lookup(net, tb, oif, &fl6, strict);
> } else {
> @@ -4238,11 +4248,23 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
> }
>
> if (unlikely(IS_ERR_OR_NULL(f6i) || f6i == net->ipv6.fib6_null_entry))
> - return 0;
> + return BPF_FIB_LKUP_RET_NOT_FWDED;
> +
> + if (unlikely(f6i->fib6_flags & RTF_REJECT)) {
> + switch (f6i->fib6_type) {
> + case RTN_BLACKHOLE:
> + return BPF_FIB_LKUP_RET_BLACKHOLE;
> + case RTN_UNREACHABLE:
> + return BPF_FIB_LKUP_RET_UNREACHABLE;
> + case RTN_PROHIBIT:
> + return BPF_FIB_LKUP_RET_PROHIBIT;
> + default:
> + return BPF_FIB_LKUP_RET_NOT_FWDED;
> + }
> + }
>
> - if (unlikely(f6i->fib6_flags & RTF_REJECT ||
> - f6i->fib6_type != RTN_UNICAST))
> - return 0;
> + if (f6i->fib6_type != RTN_UNICAST)
> + return BPF_FIB_LKUP_RET_NOT_FWDED;
>
> if (f6i->fib6_nsiblings && fl6.flowi6_oif == 0)
> f6i = ipv6_stub->fib6_multipath_select(net, f6i, &fl6,
> @@ -4252,11 +4274,11 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
> if (check_mtu) {
> mtu = ipv6_stub->ip6_mtu_from_fib6(f6i, dst, src);
> if (params->tot_len > mtu)
> - return 0;
> + return BPF_FIB_LKUP_RET_FRAG_NEEDED;
> }
>
> if (f6i->fib6_nh.nh_lwtstate)
> - return 0;
> + return BPF_FIB_LKUP_RET_UNSUPP_LWT;
>
> if (f6i->fib6_flags & RTF_GATEWAY)
> *dst = f6i->fib6_nh.nh_gw;
> @@ -4270,10 +4292,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
> */
> neigh = ___neigh_lookup_noref(ipv6_stub->nd_tbl, neigh_key_eq128,
> ndisc_hashfn, dst, dev);
> - if (neigh)
> - return bpf_fib_set_fwd_params(params, neigh, dev);
> + if (!neigh)
> + return BPF_FIB_LKUP_RET_NO_NEIGH;
>
> - return 0;
> + return bpf_fib_set_fwd_params(params, neigh, dev);
> }
> #endif
>
> diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
> index 6673cdb9f55c..a7e94e7ff87d 100644
> --- a/samples/bpf/xdp_fwd_kern.c
> +++ b/samples/bpf/xdp_fwd_kern.c
> @@ -48,9 +48,9 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
> struct ethhdr *eth = data;
> struct ipv6hdr *ip6h;
> struct iphdr *iph;
> - int out_index;
> u16 h_proto;
> u64 nh_off;
> + int rc;
>
> nh_off = sizeof(*eth);
> if (data + nh_off > data_end)
> @@ -101,7 +101,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
>
> fib_params.ifindex = ctx->ingress_ifindex;
>
> - out_index = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
> + rc = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
>
> /* verify egress index has xdp support
> * TO-DO bpf_map_lookup_elem(&tx_port, &key) fails with
> @@ -109,7 +109,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
> * NOTE: without verification that egress index supports XDP
> * forwarding packets are dropped.
> */
> - if (out_index > 0) {
> + if (rc == 0) {
> if (h_proto == htons(ETH_P_IP))
> ip_decrease_ttl(iph);
> else if (h_proto == htons(ETH_P_IPV6))
> @@ -117,7 +117,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
>
> memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
> memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
> - return bpf_redirect_map(&tx_port, out_index, 0);
> + return bpf_redirect_map(&tx_port, fib_params.ifindex, 0);
> }
>
> return XDP_PASS;
> --
> 2.11.0
>
^ permalink raw reply
* Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
From: Andrew Lunn @ 2018-06-21 17:11 UTC (permalink / raw)
To: Vadim Pasternak; +Cc: davem, netdev, jiri
In-Reply-To: <1529594883-20619-4-git-send-email-vadimp@mellanox.com>
> New internal API reads the temperature from all the modules, which are
> equipped with the thermal sensor and exposes temperature according to
> the worst measure. All individual temperature values are normalized to
> pre-defined range.
Hi Vadim
Could you explain this normalization process. Why are you not just
expose each sensors temperature in millidegrees C, which is the normal
for HWMON.
Andrew
^ permalink raw reply
* [PATCH] net: ethernet: ti: davinci_cpdma: make function cpdma_desc_pool_create static
From: Colin King @ 2018-06-21 17:16 UTC (permalink / raw)
To: David S . Miller, Florian Fainelli, Ivan Khoronzhuk, linux-omap,
netdev
Cc: kernel-janitors, linux-kernel
From: Colin Ian King <colin.king@canonical.com>
The function cpdma_desc_pool_create is local to the source and does not
need to be in global scope, so make it static.
Cleans up sparse warning:
warning: symbol 'cpdma_desc_pool_create' was not declared. Should it
be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/net/ethernet/ti/davinci_cpdma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index cdbddf16dd29..4f1267477aa4 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -205,7 +205,7 @@ static void cpdma_desc_pool_destroy(struct cpdma_ctlr *ctlr)
* devices (e.g. cpsw switches) use plain old memory. Descriptor pools
* abstract out these details
*/
-int cpdma_desc_pool_create(struct cpdma_ctlr *ctlr)
+static int cpdma_desc_pool_create(struct cpdma_ctlr *ctlr)
{
struct cpdma_params *cpdma_params = &ctlr->params;
struct cpdma_desc_pool *pool;
--
2.17.0
^ permalink raw reply related
* Re: [PATCH] net: ethernet: ti: davinci_cpdma: make function cpdma_desc_pool_create static
From: Grygorii Strashko @ 2018-06-21 17:22 UTC (permalink / raw)
To: Colin King, David S . Miller, Florian Fainelli, Ivan Khoronzhuk,
linux-omap, netdev, netdev
Cc: kernel-janitors, linux-kernel
In-Reply-To: <20180621171645.29734-1-colin.king@canonical.com>
Please, add netdev@vger.kernel.org for the future
On 06/21/2018 12:16 PM, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> The function cpdma_desc_pool_create is local to the source and does not
> need to be in global scope, so make it static.
>
> Cleans up sparse warning:
> warning: symbol 'cpdma_desc_pool_create' was not declared. Should it
> be static?
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
> drivers/net/ethernet/ti/davinci_cpdma.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
> index cdbddf16dd29..4f1267477aa4 100644
> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
> @@ -205,7 +205,7 @@ static void cpdma_desc_pool_destroy(struct cpdma_ctlr *ctlr)
> * devices (e.g. cpsw switches) use plain old memory. Descriptor pools
> * abstract out these details
> */
> -int cpdma_desc_pool_create(struct cpdma_ctlr *ctlr)
> +static int cpdma_desc_pool_create(struct cpdma_ctlr *ctlr)
> {
> struct cpdma_params *cpdma_params = &ctlr->params;
> struct cpdma_desc_pool *pool;
>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
--
regards,
-grygorii
^ permalink raw reply
* Re: [PATCH net-next 0/2] fixes for ipsec selftests
From: Shannon Nelson @ 2018-06-21 17:25 UTC (permalink / raw)
To: Anders Roxell; +Cc: Networking, David Miller
In-Reply-To: <CADYN=9+DSfu+UN3d8Te71F91ZWxCFUS0RBJLdzO4M0hZUonPiA@mail.gmail.com>
On 6/21/2018 9:56 AM, Anders Roxell wrote:
> On Thu, 21 Jun 2018 at 02:32, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>>
>> On 6/20/2018 4:18 PM, Anders Roxell wrote:
>>> On Thu, 21 Jun 2018 at 00:26, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>>>>
>>>> On 6/20/2018 12:09 PM, Anders Roxell wrote:
>>>>> On Wed, 20 Jun 2018 at 07:42, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>>>>>>
>>>>>> A couple of bad behaviors in the ipsec selftest were pointed out
>>>>>> by Anders Roxell <anders.roxell@linaro.org> and are addressed here.
>>>>>>
>>>>>> Shannon Nelson (2):
>>>>>> selftests: rtnetlink: hide complaint from terminated monitor
>>>>>> selftests: rtnetlink: use a local IP address for IPsec tests
>>>>>>
>>>>>> tools/testing/selftests/net/rtnetlink.sh | 11 +++++++----
>>>>>> 1 file changed, 7 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> --
>>>>>> 2.7.4
>>>>>>
>>>>>
>>>>> Hi Shannon,
>>>>>
>>>>> With this patches applied and my config patch.
>>>>>
>>>>> I still get this error when I run the ipsec test:
>>>>>
>>>>> FAIL: can't add fou port 7777, skipping test
>>>>> RTNETLINK answers: Operation not supported
>>>>> FAIL: can't add macsec interface, skipping test
>>>>> RTNETLINK answers: Protocol not supported
>>>>> RTNETLINK answers: No such process
>>>>> RTNETLINK answers: No such process
>>>>> FAIL: ipsec
>>>>
>>>> One of the odd things I noticed about this script is that there really
>>>> aren't any diagnosis messages, just PASS or FAIL. I followed this
>>>> custom when I added the ipsec tests, but I think this is something that
>>>> should change so we can get some idea of what breaks.
>>>>
>>>> I'm curious about the "RTNETLINK answers" messages and where they might
>>>> be coming from, especially "RTNETLINK answers: Protocol not supported".
>>>
>>> I added: "set -x" in the beginning of the rtnetlink.sh script.
>>> + ip x s add proto esp src 10.66.17.140 dst 10.66.17.141 spi 0x07 mode
>>> transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))'
>>> 0x3132333435
>>> 363738393031323334353664636261 128 sel src 10.66.17.140/24 dst 10.66.17.141/24
>>> RTNETLINK answers: Protocol not supported
>>
>> Okay, so ip didn't like this command...
>>
>>>> What are the XFRM and AES settings in your kernel config - what is the
>>>> output from
>>>> egrep -i "xfrm|_aes" .config
>>>
>>> CONFIG_XFRM=y
>>> CONFIG_XFRM_ALGO=y
>>> CONFIG_XFRM_USER=y
>>> CONFIG_INET_XFRM_MODE_TUNNEL=y
>>> CONFIG_INET6_XFRM_MODE_TRANSPORT=y
>>> CONFIG_INET6_XFRM_MODE_TUNNEL=y
>>> CONFIG_INET6_XFRM_MODE_BEET=y
>>> CONFIG_CRYPTO_AES=y
>>
>> And this is probably why - there seem to be a few config variables
>> missing, including CONFIG_INET_XFRM_MODE_TRANSPORT, which might be why
>> the ip command fails above.
>>
>> Here's what I have in my config:
>> CONFIG_XFRM=y
>> CONFIG_XFRM_OFFLOAD=y
>> CONFIG_XFRM_ALGO=m
>> CONFIG_XFRM_USER=m
>> # CONFIG_XFRM_SUB_POLICY is not set
>> # CONFIG_XFRM_MIGRATE is not set
>> CONFIG_XFRM_STATISTICS=y
>> CONFIG_XFRM_IPCOMP=m
>> CONFIG_INET_XFRM_TUNNEL=m
>> CONFIG_INET_XFRM_MODE_TRANSPORT=m
>> CONFIG_INET_XFRM_MODE_TUNNEL=m
>> CONFIG_INET_XFRM_MODE_BEET=m
>> CONFIG_INET6_XFRM_TUNNEL=m
>> CONFIG_INET6_XFRM_MODE_TRANSPORT=m
>> CONFIG_INET6_XFRM_MODE_TUNNEL=m
>> CONFIG_INET6_XFRM_MODE_BEET=m
>> CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
>> CONFIG_SECURITY_NETWORK_XFRM=y
>> CONFIG_CRYPTO_AES=y
>> # CONFIG_CRYPTO_AES_TI is not set
>> CONFIG_CRYPTO_AES_X86_64=m
>> CONFIG_CRYPTO_AES_NI_INTEL=m
>> CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
>> CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
>> CONFIG_CRYPTO_DEV_PADLOCK_AES=m
>>
>> Can I talk you into adding CONFIG_INET_XFRM_MODE_TRANSPORT to your
>> config
>
> Yes you can.
>
>> and trying again?
>
> same issue with CONFIG_INET_XFRM_MODE_TRANSPORT=y
Interesting. I took only CONFIG_INET_XFRM_MODE_TRANSPORT out of my
config and was able to see the "Protocol not supported" message. I'm
not familiar enough with the crypto algorithm setup, but I suspect
there's a combination of the other missing CONFIGs that are needed along
with CONFIG_INET_XFRM_MODE_TRANSPORT.
My knee-jerk reaction voice wants to say this is the test working as
expected, pointing out to us that the kernel config is not up to what it
should be. However, perhaps a better answer is that the test should be
reworked to just skip the rest if it can't set up the expected test
environment, as is done in the macsec case.
So the remaining question then is should the test be marked as failed,
as in the macsec test if it can't set up it's interface, or just skipped?
sln
>
> Cheers,
> Anders
>
^ permalink raw reply
* Re: [PATCH rdma-next 0/2] RoCE ICRC counter
From: Jason Gunthorpe @ 2018-06-21 17:43 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Mark Bloch,
Talat Batheesh, Saeed Mahameed, linux-netdev
In-Reply-To: <20180621123756.32645-1-leon@kernel.org>
On Thu, Jun 21, 2018 at 03:37:54PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
>
> Hi,
>
> This series exposes RoCE ICRC counter through existing RDMA hw_counters
> sysfs interface.
>
> First patch has all HW definitions in mlx5_ifc.h file and second patch is
> actual counter implementation.
The RDMA parts are OK, can you please send me the commit for the mlx5
patch when applied?
Thanks,
Jason
^ permalink raw reply
* Re: [PATCH mlx5-next 1/2] net/mlx5: Add RoCE RX ICRC encapsulated counter
From: Leon Romanovsky @ 2018-06-21 17:53 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: RDMA mailing list, Mark Bloch, Talat Batheesh, Saeed Mahameed,
linux-netdev
In-Reply-To: <20180621123756.32645-2-leon@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 1117 bytes --]
On Thu, Jun 21, 2018 at 03:37:55PM +0300, Leon Romanovsky wrote:
> From: Talat Batheesh <talatb@mellanox.com>
>
> Add capability bit in PCAM register and RoCE ICRC error counter
> to PPCNT register.
>
> Signed-off-by: Talat Batheesh <talatb@mellanox.com>
> Reviewed-by: Mark Bloch <markb@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
> include/linux/mlx5/mlx5_ifc.h | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
> index b4302ccb63a6..9e8682489951 100644
> --- a/include/linux/mlx5/mlx5_ifc.h
> +++ b/include/linux/mlx5/mlx5_ifc.h
> @@ -1687,7 +1687,11 @@ struct mlx5_ifc_eth_extended_cntrs_grp_data_layout_bits {
>
> u8 rx_buffer_full_low[0x20];
>
> - u8 reserved_at_1c0[0x600];
> + u8 rx_icrc_encapsulated_high[0x20];
> +
> + u8 rx_icrc_encapsulated_low[0x20];
> +
> + u8 reserved_at_3c0[0x5c0];
reserved_at_3c0 should be reserved_at_200, fixed and applied to mlx5-next.
Commit 0af5107cd0640ee3424e337b492e4b11b450ce28 in mlx5-next.
Thanks
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply
* [PATCH net v2] cls_flower: fix use after free in flower S/W path
From: Paolo Abeni @ 2018-06-21 18:02 UTC (permalink / raw)
To: netdev
Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Marcelo Ricardo Leitner,
Paul Blakey
If flower filter is created without the skip_sw flag, fl_mask_put()
can race with fl_classify() and we can destroy the mask rhashtable
while a lookup operation is accessing it.
BUG: unable to handle kernel paging request at 00000000000911d1
PGD 0 P4D 0
SMP PTI
CPU: 3 PID: 5582 Comm: vhost-5541 Not tainted 4.18.0-rc1.vanilla+ #1950
Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
RIP: 0010:rht_bucket_nested+0x20/0x60
Code: 31 c8 c1 c1 18 29 c8 c3 66 90 8b 4f 04 ba 01 00 00 00 8b 07 48 8b bf 80 00 00 0
RSP: 0018:ffffafc5cfbb7a48 EFLAGS: 00010206
RAX: 0000000000001978 RBX: ffff9f12dff88a00 RCX: 00000000ffff9f12
RDX: 00000000000911d1 RSI: 0000000000000148 RDI: 0000000000000001
RBP: ffff9f12dff88a00 R08: 000000005f1cc119 R09: 00000000a715fae2
R10: ffffafc5cfbb7aa8 R11: ffff9f1cb4be804e R12: ffff9f1265e13000
R13: 0000000000000000 R14: ffffafc5cfbb7b48 R15: ffff9f12dff88b68
FS: 0000000000000000(0000) GS:ffff9f1d3f0c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000911d1 CR3: 0000001575a94006 CR4: 00000000001626e0
Call Trace:
fl_lookup+0x134/0x140 [cls_flower]
fl_classify+0xf3/0x180 [cls_flower]
tcf_classify+0x78/0x150
__netif_receive_skb_core+0x69e/0xa50
netif_receive_skb_internal+0x42/0xf0
tun_get_user+0xdd5/0xfd0 [tun]
tun_sendmsg+0x52/0x70 [tun]
handle_tx+0x2b3/0x5f0 [vhost_net]
vhost_worker+0xab/0x100 [vhost]
kthread+0xf8/0x130
ret_from_fork+0x35/0x40
Modules linked in: act_mirred act_gact cls_flower vhost_net vhost tap sch_ingress
CR2: 00000000000911d1
Fix the above waiting for a RCU grace period before destroying the
rhashtable: we need to use tcf_queue_work(), as rhashtable_destroy()
must run in process context, as pointed out by Cong Wang.
v1 -> v2: use tcf_queue_work to run rhashtable_destroy().
Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/sched/cls_flower.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 2b5be42a9f1c..9e8b26a80fb3 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -66,7 +66,7 @@ struct fl_flow_mask {
struct rhashtable_params filter_ht_params;
struct flow_dissector dissector;
struct list_head filters;
- struct rcu_head rcu;
+ struct rcu_work rwork;
struct list_head list;
};
@@ -203,6 +203,20 @@ static int fl_init(struct tcf_proto *tp)
return rhashtable_init(&head->ht, &mask_ht_params);
}
+static void fl_mask_free(struct fl_flow_mask *mask)
+{
+ rhashtable_destroy(&mask->ht);
+ kfree(mask);
+}
+
+static void fl_mask_free_work(struct work_struct *work)
+{
+ struct fl_flow_mask *mask = container_of(to_rcu_work(work),
+ struct fl_flow_mask, rwork);
+
+ fl_mask_free(mask);
+}
+
static bool fl_mask_put(struct cls_fl_head *head, struct fl_flow_mask *mask,
bool async)
{
@@ -210,12 +224,11 @@ static bool fl_mask_put(struct cls_fl_head *head, struct fl_flow_mask *mask,
return false;
rhashtable_remove_fast(&head->ht, &mask->ht_node, mask_ht_params);
- rhashtable_destroy(&mask->ht);
list_del_rcu(&mask->list);
if (async)
- kfree_rcu(mask, rcu);
+ tcf_queue_work(&mask->rwork, fl_mask_free_work);
else
- kfree(mask);
+ fl_mask_free(mask);
return true;
}
--
2.17.1
^ permalink raw reply related
* RE: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
From: Vadim Pasternak @ 2018-06-21 18:14 UTC (permalink / raw)
To: Andrew Lunn; +Cc: davem@davemloft.net, netdev@vger.kernel.org, jiri@resnulli.us
In-Reply-To: <20180621171120.GA6830@lunn.ch>
> -----Original Message-----
> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Thursday, June 21, 2018 8:11 PM
> To: Vadim Pasternak <vadimp@mellanox.com>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; jiri@resnulli.us
> Subject: Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for
> port temperature reading
>
> > New internal API reads the temperature from all the modules, which are
> > equipped with the thermal sensor and exposes temperature according to
> > the worst measure. All individual temperature values are normalized to
> > pre-defined range.
>
> Hi Vadim
>
> Could you explain this normalization process. Why are you not just expose each
> sensors temperature in millidegrees C, which is the normal for HWMON.
Hi Andrew,
The temperature of each individual module can be obtained
through ethtool.
The worst temperature is necessary for the system cooling
control decision.
Up to 64 SFP/QSFP modules could be connected to the system.
Some of them could cooper modules, which doesn't provide
temperature measurement.
Some of them could be optical modules, providing untrusted
temperature measurement, which could impact thermal
control of the system.
Also optical modules could be from the different vendors, and
this is real situation, when, f.e. one module has the warning and
critical thresholds 75C and 85C, while another 70C and 80C.
In such case the first module temperature 72C is better, then the
second module temperature 71C.
And deltas between warning and critical thresholds, could be
different as well. It could be 5C, 10C, etc.
So, nominal temperature is not the case here, we should know the
"worst" value for the thermal control decision.
Thanks,
Vadim.
>
> Andrew
^ permalink raw reply
* Re: [PATCH net v2] cls_flower: fix use after free in flower S/W path
From: Jiri Pirko @ 2018-06-21 18:16 UTC (permalink / raw)
To: Paolo Abeni
Cc: netdev, Jamal Hadi Salim, Cong Wang, Marcelo Ricardo Leitner,
Paul Blakey
In-Reply-To: <fd96de4e9dc358e3982922ae681fdb1b9d8ae72a.1529603970.git.pabeni@redhat.com>
Thu, Jun 21, 2018 at 08:02:16PM CEST, pabeni@redhat.com wrote:
>If flower filter is created without the skip_sw flag, fl_mask_put()
>can race with fl_classify() and we can destroy the mask rhashtable
>while a lookup operation is accessing it.
>
> BUG: unable to handle kernel paging request at 00000000000911d1
> PGD 0 P4D 0
> SMP PTI
> CPU: 3 PID: 5582 Comm: vhost-5541 Not tainted 4.18.0-rc1.vanilla+ #1950
> Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
> RIP: 0010:rht_bucket_nested+0x20/0x60
> Code: 31 c8 c1 c1 18 29 c8 c3 66 90 8b 4f 04 ba 01 00 00 00 8b 07 48 8b bf 80 00 00 0
> RSP: 0018:ffffafc5cfbb7a48 EFLAGS: 00010206
> RAX: 0000000000001978 RBX: ffff9f12dff88a00 RCX: 00000000ffff9f12
> RDX: 00000000000911d1 RSI: 0000000000000148 RDI: 0000000000000001
> RBP: ffff9f12dff88a00 R08: 000000005f1cc119 R09: 00000000a715fae2
> R10: ffffafc5cfbb7aa8 R11: ffff9f1cb4be804e R12: ffff9f1265e13000
> R13: 0000000000000000 R14: ffffafc5cfbb7b48 R15: ffff9f12dff88b68
> FS: 0000000000000000(0000) GS:ffff9f1d3f0c0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000911d1 CR3: 0000001575a94006 CR4: 00000000001626e0
> Call Trace:
> fl_lookup+0x134/0x140 [cls_flower]
> fl_classify+0xf3/0x180 [cls_flower]
> tcf_classify+0x78/0x150
> __netif_receive_skb_core+0x69e/0xa50
> netif_receive_skb_internal+0x42/0xf0
> tun_get_user+0xdd5/0xfd0 [tun]
> tun_sendmsg+0x52/0x70 [tun]
> handle_tx+0x2b3/0x5f0 [vhost_net]
> vhost_worker+0xab/0x100 [vhost]
> kthread+0xf8/0x130
> ret_from_fork+0x35/0x40
> Modules linked in: act_mirred act_gact cls_flower vhost_net vhost tap sch_ingress
> CR2: 00000000000911d1
>
>Fix the above waiting for a RCU grace period before destroying the
>rhashtable: we need to use tcf_queue_work(), as rhashtable_destroy()
>must run in process context, as pointed out by Cong Wang.
>
>v1 -> v2: use tcf_queue_work to run rhashtable_destroy().
>
>Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
>Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
^ permalink raw reply
* Re: Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-21 18:20 UTC (permalink / raw)
To: Cornelia Huck
Cc: Siwei Liu, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180621165913.7e3f4faa.cohuck@redhat.com>
On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote:
> On Wed, 20 Jun 2018 22:48:58 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
> > > In any case, I'm not sure anymore why we'd want the extra uuid.
> >
> > It's mostly so we can have e.g. multiple devices with same MAC
> > (which some people seem to want in order to then use
> > then with different containers).
> >
> > But it is also handy for when you assign a PF, since then you
> > can't set the MAC.
> >
>
> OK, so what about the following:
>
> - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
> that we have a new uuid field in the virtio-net config space
> - in QEMU, add a property for virtio-net that allows to specify a uuid,
> offer VIRTIO_NET_F_STANDBY_UUID if set
> - when configuring, set the property to the group UUID of the vfio-pci
> device
> - in the guest, use the uuid from the virtio-net device's config space
> if applicable; else, fall back to matching by MAC as done today
>
> That should work for all virtio transports.
True. I'm a bit unhappy that it's virtio net specific though
since down the road I expect we'll have a very similar feature
for scsi (and maybe others).
But we do not have a way to have fields that are portable
both across devices and transports, and I think it would
be a useful addition. How would this work though? Any idea?
--
MST
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox