* Re: libbpf distro packaging
From: Andrii Nakryiko @ 2019-08-13 18:26 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Jiri Olsa, Julia Kartseva, labbott@redhat.com, acme@kernel.org,
debian-kernel@lists.debian.org, netdev@vger.kernel.org,
Andrii Nakryiko, Andrey Ignatov, Alexei Starovoitov,
Yonghong Song, jolsa@kernel.org
In-Reply-To: <eb5cf65a-1aa0-fde4-e726-41a736cb7314@iogearbox.net>
On Tue, Aug 13, 2019 at 7:14 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/13/19 2:24 PM, Jiri Olsa wrote:
> > On Mon, Aug 12, 2019 at 07:04:12PM +0000, Julia Kartseva wrote:
> >> I would like to bring up libbpf publishing discussion started at [1].
> >> The present state of things is that libbpf is built from kernel tree, e.g. [2]
> >> For Debian and [3] for Fedora whereas the better way would be having a
> >> package built from github mirror. The advantages of the latter:
> >> - Consistent, ABI matching versioning across distros
> >> - The mirror has integration tests
> >> - No need in kernel tree to build a package
> >> - Changes can be merged directly to github w/o waiting them to be merged
> >> through bpf-next -> net-next -> main
> >> There is a PR introducing a libbpf.spec which can be used as a starting point: [4]
> >> Any comments regarding the spec itself can be posted there.
> >> In the future it may be used as a source of truth.
> >> Please consider switching libbpf packaging to the github mirror instead
> >> of the kernel tree.
> >> Thanks
> >>
> >> [1] https://lists.iovisor.org/g/iovisor-dev/message/1521
> >> [2] https://packages.debian.org/sid/libbpf4.19
> >> [3] http://rpmfind.net/linux/RPM/fedora/devel/rawhide/x86_64/l/libbpf-5.3.0-0.rc2.git0.1.fc31.x86_64.html
> >> [4] https://github.com/libbpf/libbpf/pull/64
> >
> > hi,
> > Fedora has libbpf as kernel-tools subpackage, so I think
> > we'd need to create new package and deprecate the current
> >
> > but I like the ABI stability by using github .. how's actually
> > the sync (in both directions) with kernel sources going on?
>
> The upstream kernel's tools/lib/bpf/ is always source of truth. Meaning, changes need
> to make it upstream first and they are later synced into the GH stand-alone repo.
As I mentioned in reply to Jiri, kernel's tools/lib/bpf are the source
of truth for the sources of libbpf itself, but Github has some extra
stuff necessary to make libbpf work/build in isolation from kernel.
Plus some administrative stuff (e.g., sync script).
So if this spec is geared towards Github layout and for use with
Github projection of libbpf, maybe it makes more sense to keep it in
Github only? Is that spec going to be useful in kernel sources? Or
will it just create more confusion on why it's there?
Plus it will make it easier to synchronize version bumping/tagging of
new release on Github.
>
> Thanks,
> Daniel
^ permalink raw reply
* Re: libbpf distro packaging
From: Andrii Nakryiko @ 2019-08-13 18:23 UTC (permalink / raw)
To: Jiri Olsa
Cc: Julia Kartseva, labbott@redhat.com, acme@kernel.org,
debian-kernel@lists.debian.org, netdev@vger.kernel.org,
Andrii Nakryiko, Andrey Ignatov, Alexei Starovoitov,
Yonghong Song, jolsa@kernel.org
In-Reply-To: <20190813122420.GB9349@krava>
On Tue, Aug 13, 2019 at 5:26 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Mon, Aug 12, 2019 at 07:04:12PM +0000, Julia Kartseva wrote:
> > I would like to bring up libbpf publishing discussion started at [1].
> > The present state of things is that libbpf is built from kernel tree, e.g. [2]
> > For Debian and [3] for Fedora whereas the better way would be having a
> > package built from github mirror. The advantages of the latter:
> > - Consistent, ABI matching versioning across distros
> > - The mirror has integration tests
> > - No need in kernel tree to build a package
> > - Changes can be merged directly to github w/o waiting them to be merged
> > through bpf-next -> net-next -> main
> > There is a PR introducing a libbpf.spec which can be used as a starting point: [4]
> > Any comments regarding the spec itself can be posted there.
> > In the future it may be used as a source of truth.
> > Please consider switching libbpf packaging to the github mirror instead
> > of the kernel tree.
> > Thanks
> >
> > [1] https://lists.iovisor.org/g/iovisor-dev/message/1521
> > [2] https://packages.debian.org/sid/libbpf4.19
> > [3] http://rpmfind.net/linux/RPM/fedora/devel/rawhide/x86_64/l/libbpf-5.3.0-0.rc2.git0.1.fc31.x86_64.html
> > [4] https://github.com/libbpf/libbpf/pull/64
>
> hi,
> Fedora has libbpf as kernel-tools subpackage, so I think
> we'd need to create new package and deprecate the current
>
> but I like the ABI stability by using github .. how's actually
> the sync (in both directions) with kernel sources going on?
Sync is always in one direction, from kernel sources into Github repo.
Right now it's triggered by a human (usually me), but we are using a
script that automates entire process (see
https://github.com/libbpf/libbpf/blob/master/scripts/sync-kernel.sh).
It cherry-pick relevant commits from kernel, transforms them to match
Github's file layout and re-applies those changes to Github repo.
There is never a sync from Github back to kernel, but Github repo
contains some extra stuff that's not in kernel. E.g., the script I
mentioned, plus Github's Makefile is different, because it can't rely
on kernel's kbuild setup.
>
> thanks,
> jirka
^ permalink raw reply
* [PATCH net-next 5/5] rds: check for excessive looping in rds_send_xmit
From: Gerd Rausch @ 2019-08-13 18:21 UTC (permalink / raw)
To: Santosh Shilimkar, netdev, linux-rdma, rds-devel; +Cc: David Miller
From: Andy Grover <andy.grover@oracle.com>
Date: Thu, 13 Jan 2011 11:40:31 -0800
Original commit from 2011 updated to include a change by
Yuval Shaia <yuval.shaia@oracle.com>
that adds a new statistic counter "send_stuck_rm"
to capture the messages looping exessively
in the send path.
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
---
net/rds/rds.h | 2 +-
net/rds/send.c | 12 ++++++++++++
net/rds/stats.c | 1 +
3 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/net/rds/rds.h b/net/rds/rds.h
index f0066d168499..ad605fd61655 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -717,7 +717,7 @@ struct rds_statistics {
uint64_t s_cong_send_blocked;
uint64_t s_recv_bytes_added_to_socket;
uint64_t s_recv_bytes_removed_from_socket;
-
+ uint64_t s_send_stuck_rm;
};
/* af_rds.c */
diff --git a/net/rds/send.c b/net/rds/send.c
index 031b1e97a466..9ce552abf9e9 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -145,6 +145,7 @@ int rds_send_xmit(struct rds_conn_path *cp)
LIST_HEAD(to_be_dropped);
int batch_count;
unsigned long send_gen = 0;
+ int same_rm = 0;
restart:
batch_count = 0;
@@ -200,6 +201,17 @@ int rds_send_xmit(struct rds_conn_path *cp)
rm = cp->cp_xmit_rm;
+ if (!rm) {
+ same_rm = 0;
+ } else {
+ same_rm++;
+ if (same_rm >= 4096) {
+ rds_stats_inc(s_send_stuck_rm);
+ ret = -EAGAIN;
+ break;
+ }
+ }
+
/*
* If between sending messages, we can send a pending congestion
* map update.
diff --git a/net/rds/stats.c b/net/rds/stats.c
index 6bbab4d74c4f..9e87da43c004 100644
--- a/net/rds/stats.c
+++ b/net/rds/stats.c
@@ -78,6 +78,7 @@ static const char *const rds_stat_names[] = {
"cong_send_blocked",
"recv_bytes_added_to_sock",
"recv_bytes_freed_fromsock",
+ "send_stuck_rm",
};
void rds_stats_info_copy(struct rds_info_iterator *iter,
--
2.22.0
^ permalink raw reply related
* [PATCH net-next 4/5] net/rds: Add a few missing rds_stat_names entries
From: Gerd Rausch @ 2019-08-13 18:21 UTC (permalink / raw)
To: Santosh Shilimkar, netdev, linux-rdma, rds-devel; +Cc: David Miller
Date: Thu, 11 Jul 2019 12:15:50 -0700
In a previous commit, fields were added to "struct rds_statistics"
but array "rds_stat_names" was not updated accordingly.
Please note the inconsistent naming of the string representations
that is done in the name of compatibility
with the Oracle internal code-base.
s_recv_bytes_added_to_socket -> "recv_bytes_added_to_sock"
s_recv_bytes_removed_from_socket -> "recv_bytes_freed_fromsock"
Fixes: 192a798f5299 ("RDS: add stat for socket recv memory usage")
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
---
net/rds/stats.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/rds/stats.c b/net/rds/stats.c
index 73be187d389e..6bbab4d74c4f 100644
--- a/net/rds/stats.c
+++ b/net/rds/stats.c
@@ -76,6 +76,8 @@ static const char *const rds_stat_names[] = {
"cong_update_received",
"cong_send_error",
"cong_send_blocked",
+ "recv_bytes_added_to_sock",
+ "recv_bytes_freed_fromsock",
};
void rds_stats_info_copy(struct rds_info_iterator *iter,
--
2.22.0
^ permalink raw reply related
* [PATCH net-next 3/5] RDS: don't use GFP_ATOMIC for sk_alloc in rds_create
From: Gerd Rausch @ 2019-08-13 18:21 UTC (permalink / raw)
To: Santosh Shilimkar, netdev, linux-rdma, rds-devel; +Cc: David Miller
From: Chris Mason <chris.mason@oracle.com>
Date: Fri, 3 Feb 2012 11:08:51 -0500
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
---
net/rds/af_rds.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 2b969f99ef13..7228892046cf 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -705,7 +705,7 @@ static int rds_create(struct net *net, struct socket *sock, int protocol,
if (sock->type != SOCK_SEQPACKET || protocol)
return -ESOCKTNOSUPPORT;
- sk = sk_alloc(net, AF_RDS, GFP_ATOMIC, &rds_proto, kern);
+ sk = sk_alloc(net, AF_RDS, GFP_KERNEL, &rds_proto, kern);
if (!sk)
return -ENOMEM;
--
2.22.0
^ permalink raw reply related
* [PATCH net-next 1/5] RDS: Re-add pf/sol access via sysctl
From: Gerd Rausch @ 2019-08-13 18:20 UTC (permalink / raw)
To: Santosh Shilimkar, netdev, linux-rdma, rds-devel; +Cc: David Miller
From: Andy Grover <andy.grover@oracle.com>
Date: Tue, 24 Nov 2009 15:35:51 -0800
Although RDS has an official PF_RDS value now, existing software
expects to look for rds sysctls to determine it. We need to maintain
these for now, for backwards compatibility.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
---
net/rds/sysctl.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/net/rds/sysctl.c b/net/rds/sysctl.c
index e381bbcd9cc1..9760292a0af4 100644
--- a/net/rds/sysctl.c
+++ b/net/rds/sysctl.c
@@ -49,6 +49,13 @@ unsigned int rds_sysctl_max_unacked_bytes = (16 << 20);
unsigned int rds_sysctl_ping_enable = 1;
+/*
+ * We have official values, but must maintain the sysctl interface for existing
+ * software that expects to find these values here.
+ */
+static int rds_sysctl_pf_rds = PF_RDS;
+static int rds_sysctl_sol_rds = SOL_RDS;
+
static struct ctl_table rds_sysctl_rds_table[] = {
{
.procname = "reconnect_min_delay_ms",
@@ -68,6 +75,20 @@ static struct ctl_table rds_sysctl_rds_table[] = {
.extra1 = &rds_sysctl_reconnect_min_jiffies,
.extra2 = &rds_sysctl_reconnect_max,
},
+ {
+ .procname = "pf_rds",
+ .data = &rds_sysctl_pf_rds,
+ .maxlen = sizeof(int),
+ .mode = 0444,
+ .proc_handler = &proc_dointvec,
+ },
+ {
+ .procname = "sol_rds",
+ .data = &rds_sysctl_sol_rds,
+ .maxlen = sizeof(int),
+ .mode = 0444,
+ .proc_handler = &proc_dointvec,
+ },
{
.procname = "max_unacked_packets",
.data = &rds_sysctl_max_unacked_packets,
--
2.22.0
^ permalink raw reply related
* [PATCH net-next 0/5] net/rds: Fixes from internal Oracle repo
From: Gerd Rausch @ 2019-08-13 18:20 UTC (permalink / raw)
To: Santosh Shilimkar, netdev, linux-rdma, rds-devel; +Cc: David Miller
This is the first set of (mostly old) patches from our internal repository
in an effort to synchronize what Oracle had been using internally
with what is shipped with the Linux kernel.
Andy Grover (2):
RDS: Re-add pf/sol access via sysctl
rds: check for excessive looping in rds_send_xmit
Chris Mason (2):
RDS: limit the number of times we loop in rds_send_xmit
RDS: don't use GFP_ATOMIC for sk_alloc in rds_create
Gerd Rausch (1):
net/rds: Add a few missing rds_stat_names entries
net/rds/af_rds.c | 2 +-
net/rds/ib_recv.c | 12 +++++++++++-
net/rds/rds.h | 2 +-
net/rds/send.c | 12 ++++++++++++
net/rds/stats.c | 3 +++
net/rds/sysctl.c | 21 +++++++++++++++++++++
6 files changed, 49 insertions(+), 3 deletions(-)
--
2.22.0
^ permalink raw reply
* [PATCH net-next 2/5] RDS: limit the number of times we loop in rds_send_xmit
From: Gerd Rausch @ 2019-08-13 18:20 UTC (permalink / raw)
To: Santosh Shilimkar, netdev, linux-rdma, rds-devel; +Cc: David Miller
From: Chris Mason <chris.mason@oracle.com>
Date: Fri, 3 Feb 2012 11:07:54 -0500
This will kick the RDS worker thread if we have been looping
too long.
Original commit from 2012 updated to include a change by
Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
that triggers "must_wake" if "rds_ib_recv_refill_one" fails.
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
---
net/rds/ib_recv.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 3cae88cbdaa0..1a8a4a760b84 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -385,6 +385,7 @@ void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp)
unsigned int posted = 0;
int ret = 0;
bool can_wait = !!(gfp & __GFP_DIRECT_RECLAIM);
+ bool must_wake = false;
u32 pos;
/* the goal here is to just make sure that someone, somewhere
@@ -405,6 +406,7 @@ void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp)
recv = &ic->i_recvs[pos];
ret = rds_ib_recv_refill_one(conn, recv, gfp);
if (ret) {
+ must_wake = true;
break;
}
@@ -423,6 +425,11 @@ void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp)
}
posted++;
+
+ if ((posted > 128 && need_resched()) || posted > 8192) {
+ must_wake = true;
+ break;
+ }
}
/* We're doing flow control - update the window. */
@@ -445,10 +452,13 @@ void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp)
* if we should requeue.
*/
if (rds_conn_up(conn) &&
- ((can_wait && rds_ib_ring_low(&ic->i_recv_ring)) ||
+ (must_wake ||
+ (can_wait && rds_ib_ring_low(&ic->i_recv_ring)) ||
rds_ib_ring_empty(&ic->i_recv_ring))) {
queue_delayed_work(rds_wq, &conn->c_recv_w, 1);
}
+ if (can_wait)
+ cond_resched();
}
/*
--
2.22.0
^ permalink raw reply related
* [PATCH 3/3] net: qca: update MODULE_AUTHOR() email address
From: Stefan Wahren @ 2019-08-13 18:17 UTC (permalink / raw)
To: Jean Delvare, Guenter Roeck, David S. Miller, Srinivas Kandagatla,
Shawn Guo, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
NXP Linux Team
Cc: linux-hwmon, linux-arm-kernel, linux-kernel, netdev,
Stefan Wahren
In-Reply-To: <1565720249-6549-1-git-send-email-wahrenst@gmx.net>
I2SE has been acquired by in-tech. So the email address listed in
MODULE_AUTHOR() will be disabled in the near future. I only have access
to QCA7000 boards at in-tech, so use my new company address.
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
---
drivers/net/ethernet/qualcomm/qca_7k_common.c | 2 +-
drivers/net/ethernet/qualcomm/qca_spi.c | 2 +-
drivers/net/ethernet/qualcomm/qca_uart.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/qualcomm/qca_7k_common.c b/drivers/net/ethernet/qualcomm/qca_7k_common.c
index 6b511f0..87d023d 100644
--- a/drivers/net/ethernet/qualcomm/qca_7k_common.c
+++ b/drivers/net/ethernet/qualcomm/qca_7k_common.c
@@ -162,5 +162,5 @@ EXPORT_SYMBOL_GPL(qcafrm_fsm_decode);
MODULE_DESCRIPTION("Qualcomm Atheros QCA7000 common");
MODULE_AUTHOR("Qualcomm Atheros Communications");
-MODULE_AUTHOR("Stefan Wahren <stefan.wahren@i2se.com>");
+MODULE_AUTHOR("Stefan Wahren <stefan.wahren@in-tech.com>");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c
index b28360b..d9d3554 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -1034,6 +1034,6 @@ module_spi_driver(qca_spi_driver);
MODULE_DESCRIPTION("Qualcomm Atheros QCA7000 SPI Driver");
MODULE_AUTHOR("Qualcomm Atheros Communications");
-MODULE_AUTHOR("Stefan Wahren <stefan.wahren@i2se.com>");
+MODULE_AUTHOR("Stefan Wahren <stefan.wahren@in-tech.com>");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_VERSION(QCASPI_DRV_VERSION);
diff --git a/drivers/net/ethernet/qualcomm/qca_uart.c b/drivers/net/ethernet/qualcomm/qca_uart.c
index 5906168..ab4da8e 100644
--- a/drivers/net/ethernet/qualcomm/qca_uart.c
+++ b/drivers/net/ethernet/qualcomm/qca_uart.c
@@ -418,6 +418,6 @@ module_serdev_device_driver(qca_uart_driver);
MODULE_DESCRIPTION("Qualcomm Atheros QCA7000 UART Driver");
MODULE_AUTHOR("Qualcomm Atheros Communications");
-MODULE_AUTHOR("Stefan Wahren <stefan.wahren@i2se.com>");
+MODULE_AUTHOR("Stefan Wahren <stefan.wahren@in-tech.com>");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_VERSION(QCAUART_DRV_VERSION);
--
2.7.4
^ permalink raw reply related
* [PATCH 2/3] hwmon: raspberrypi: update MODULE_AUTHOR() email address
From: Stefan Wahren @ 2019-08-13 18:17 UTC (permalink / raw)
To: Jean Delvare, Guenter Roeck, David S. Miller, Srinivas Kandagatla,
Shawn Guo, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
NXP Linux Team
Cc: linux-hwmon, linux-arm-kernel, linux-kernel, netdev,
Stefan Wahren
In-Reply-To: <1565720249-6549-1-git-send-email-wahrenst@gmx.net>
The email address listed in MODULE_AUTHOR() will be disabled in the
near future. Replace it with my private one.
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
---
drivers/hwmon/raspberrypi-hwmon.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwmon/raspberrypi-hwmon.c b/drivers/hwmon/raspberrypi-hwmon.c
index efe4bb1..d3a64a3 100644
--- a/drivers/hwmon/raspberrypi-hwmon.c
+++ b/drivers/hwmon/raspberrypi-hwmon.c
@@ -146,7 +146,7 @@ static struct platform_driver rpi_hwmon_driver = {
};
module_platform_driver(rpi_hwmon_driver);
-MODULE_AUTHOR("Stefan Wahren <stefan.wahren@i2se.com>");
+MODULE_AUTHOR("Stefan Wahren <wahrenst@gmx.net>");
MODULE_DESCRIPTION("Raspberry Pi voltage sensor driver");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS("platform:raspberrypi-hwmon");
--
2.7.4
^ permalink raw reply related
* [PATCH 1/3] nvmem: mxs-ocotp: update MODULE_AUTHOR() email address
From: Stefan Wahren @ 2019-08-13 18:17 UTC (permalink / raw)
To: Jean Delvare, Guenter Roeck, David S. Miller, Srinivas Kandagatla,
Shawn Guo, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
NXP Linux Team
Cc: linux-hwmon, linux-arm-kernel, linux-kernel, netdev,
Stefan Wahren
The email address listed in MODULE_AUTHOR() will be disabled in the
near future. Replace it with my private one.
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
---
drivers/nvmem/mxs-ocotp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvmem/mxs-ocotp.c b/drivers/nvmem/mxs-ocotp.c
index c34d9fe..8e4898d 100644
--- a/drivers/nvmem/mxs-ocotp.c
+++ b/drivers/nvmem/mxs-ocotp.c
@@ -200,6 +200,6 @@ static struct platform_driver mxs_ocotp_driver = {
};
module_platform_driver(mxs_ocotp_driver);
-MODULE_AUTHOR("Stefan Wahren <stefan.wahren@i2se.com>");
+MODULE_AUTHOR("Stefan Wahren <wahrenst@gmx.net");
MODULE_DESCRIPTION("driver for OCOTP in i.MX23/i.MX28");
MODULE_LICENSE("GPL v2");
--
2.7.4
^ permalink raw reply related
* Re: [RESEND][PATCH v3 bpf-next] btf: expose BTF info through sysfs
From: Andrii Nakryiko @ 2019-08-13 18:08 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov, Kernel Team
In-Reply-To: <c4103c58-941c-da3a-9abd-eecbcd256f1d@iogearbox.net>
On Tue, Aug 13, 2019 at 7:20 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/12/19 8:39 PM, Andrii Nakryiko wrote:
> > Make .BTF section allocated and expose its contents through sysfs.
> >
> > /sys/kernel/btf directory is created to contain all the BTFs present
> > inside kernel. Currently there is only kernel's main BTF, represented as
> > /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
> > each module will expose its BTF as /sys/kernel/btf/<module-name> file.
> >
> > Current approach relies on a few pieces coming together:
> > 1. pahole is used to take almost final vmlinux image (modulo .BTF and
> > kallsyms) and generate .BTF section by converting DWARF info into
> > BTF. This section is not allocated and not mapped to any segment,
> > though, so is not yet accessible from inside kernel at runtime.
> > 2. objcopy dumps .BTF contents into binary file and subsequently
> > convert binary file into linkable object file with automatically
> > generated symbols _binary__btf_kernel_bin_start and
> > _binary__btf_kernel_bin_end, pointing to start and end, respectively,
> > of BTF raw data.
> > 3. final vmlinux image is generated by linking this object file (and
> > kallsyms, if necessary). sysfs_btf.c then creates
> > /sys/kernel/btf/kernel file and exposes embedded BTF contents through
> > it. This allows, e.g., libbpf and bpftool access BTF info at
> > well-known location, without resorting to searching for vmlinux image
> > on disk (location of which is not standardized and vmlinux image
> > might not be even available in some scenarios, e.g., inside qemu
> > during testing).
>
> Small question: given modules will be covered later, would it not be more
> obvious to name it /sys/kernel/btf/vmlinux instead?
vmlinux totally makes sense, not sure why I didn't think about that initially...
I'll follow up with a rename.
>
> > Alternative approach using .incbin assembler directive to embed BTF
> > contents directly was attempted but didn't work, because sysfs_proc.o is
> > not re-compiled during link-vmlinux.sh stage. This is required, though,
> > to update embedded BTF data (initially empty data is embedded, then
> > pahole generates BTF info and we need to regenerate sysfs_btf.o with
> > updated contents, but it's too late at that point).
> >
> > If BTF couldn't be generated due to missing or too old pahole,
> > sysfs_btf.c handles that gracefully by detecting that
> > _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
> > /sys/kernel/btf at all.
> >
> > v2->v3:
> > - added Documentation/ABI/testing/sysfs-kernel-btf (Greg K-H);
> > - created proper kobject (btf_kobj) for btf directory (Greg K-H);
> > - undo v2 change of reusing vmlinux, as it causes extra kallsyms pass
> > due to initially missing __binary__btf_kernel_bin_{start/end} symbols;
> >
> > v1->v2:
> > - allow kallsyms stage to re-use vmlinux generated by gen_btf();
> >
> > Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>
> In any case, this is great progress, applied thanks!
^ permalink raw reply
* Re: [RFC PATCH 4/5] PTP: Add flag for non-periodic output
From: Richard Cochran @ 2019-08-13 18:06 UTC (permalink / raw)
To: Felipe Balbi
Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <20190813174821.GC3207@localhost>
On Tue, Aug 13, 2019 at 10:48:21AM -0700, Richard Cochran wrote:
> > + if (copy_from_user(&req.extts, (void __user *)arg,
> > + sizeof(req.extts))) {
> > + err = -EFAULT;
> > + break;
> > + }
> > + if (req.extts.flags || req.extts.rsv[0]
> > + || req.extts.rsv[1]) {
> > + err = -EINVAL;
>
> Since the code is mostly the same as in the PTP_EXTTS_REQUEST case,
> maybe just double up the case statements (like in the other) and add
> an extra test for (cmd == PTP_EXTTS_REQUEST2) for this if-block.
Thinking about the drivers, in the case of the legacy ioctls, let's
also be sure to clear the flags and reserved fields before passing
them to the drivers.
Thanks,
Richard
^ permalink raw reply
* Re: [RFC PATCH 0/5] PTP: add support for Intel's TGPIO controller
From: Richard Cochran @ 2019-08-13 17:49 UTC (permalink / raw)
To: Felipe Balbi
Cc: Andrew Lunn, netdev, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, H . Peter Anvin, x86, linux-kernel,
Christopher S . Hall
In-Reply-To: <87wofhy05t.fsf@gmail.com>
On Tue, Aug 13, 2019 at 10:50:06AM +0300, Felipe Balbi wrote:
> If we do that we make it difficult for those reading specification and
> trying to find the matching driver.
+1
Thanks,
Richard
^ permalink raw reply
* Re: [RFC PATCH 4/5] PTP: Add flag for non-periodic output
From: Richard Cochran @ 2019-08-13 17:48 UTC (permalink / raw)
To: Felipe Balbi
Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <87tvalxzzi.fsf@gmail.com>
On Tue, Aug 13, 2019 at 10:53:53AM +0300, Felipe Balbi wrote:
> before I send a new series built on top of this change, I thought I'd
> check with you if I'm on the right path. Below you can find my current
> take at the new IOCTLs. I maintained the same exact structures so that
> there's no maintenance burden. Also introduce a new IOCTL for every
> single one of the previously existing ones even though not all of them
> needed changes. The reason for that was just to make it easier for
> libary authors to update their library by a simple sed script adding '2'
> to the end of the IOCTL macro.
Sounds good. I have a few comments, below...
> Let me know if you want anything to be changed or had a different idea
> about any of this. Also, if you prefer that I finish the entire series
> before you review, no worries either ;-)
>
> Cheers, patch follows:
>
> From bc2aa511d4c2e2228590fb29604c6c33b56527ad Mon Sep 17 00:00:00 2001
> From: Felipe Balbi <felipe.balbi@linux.intel.com>
> Date: Tue, 13 Aug 2019 10:32:35 +0300
> Subject: [PATCH] PTP: introduce new versions of IOCTLs
>
> The current version of the IOCTL have a small problem which prevents
> us from extending the API by making use of reserved fields. In these
> new IOCTLs, we are now making sure that flags and rsv fields are zero
> which will allow us to extend the API in the future.
>
> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
> ---
> drivers/ptp/ptp_chardev.c | 105 +++++++++++++++++++++++++++++++++
> include/uapi/linux/ptp_clock.h | 12 ++++
> 2 files changed, 117 insertions(+)
>
> diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
> index 18ffe449efdf..94775073527b 100644
> --- a/drivers/ptp/ptp_chardev.c
> +++ b/drivers/ptp/ptp_chardev.c
> @@ -126,6 +126,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
> switch (cmd) {
>
> case PTP_CLOCK_GETCAPS:
> + case PTP_CLOCK_GETCAPS2:
> memset(&caps, 0, sizeof(caps));
> caps.max_adj = ptp->info->max_adj;
> caps.n_alarm = ptp->info->n_alarm;
> @@ -153,6 +154,28 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
> err = ops->enable(ops, &req, enable);
> break;
>
> + case PTP_EXTTS_REQUEST2:
> + memset(&req, 0, sizeof(req));
This memset not needed, AFAICT. Oh wait, you want to keep drivers
from seeing stack data in the unused parts of the union. That is
fine, but please just do that unconditionally at the top of the
function.
> + if (copy_from_user(&req.extts, (void __user *)arg,
> + sizeof(req.extts))) {
> + err = -EFAULT;
> + break;
> + }
> + if (req.extts.flags || req.extts.rsv[0]
> + || req.extts.rsv[1]) {
> + err = -EINVAL;
Since the code is mostly the same as in the PTP_EXTTS_REQUEST case,
maybe just double up the case statements (like in the other) and add
an extra test for (cmd == PTP_EXTTS_REQUEST2) for this if-block.
> + break;
> + }
> +
> + if (req.extts.index >= ops->n_ext_ts) {
> + err = -EINVAL;
> + break;
> + }
> + req.type = PTP_CLK_REQ_EXTTS;
> + enable = req.extts.flags & PTP_ENABLE_FEATURE ? 1 : 0;
> + err = ops->enable(ops, &req, enable);
> + break;
> +
> case PTP_PEROUT_REQUEST:
> if (copy_from_user(&req.perout, (void __user *)arg,
> sizeof(req.perout))) {
> @@ -168,6 +191,28 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
> err = ops->enable(ops, &req, enable);
> break;
>
> + case PTP_PEROUT_REQUEST2:
> + memset(&req, 0, sizeof(req));
> + if (copy_from_user(&req.perout, (void __user *)arg,
> + sizeof(req.perout))) {
> + err = -EFAULT;
> + break;
> + }
> + if (req.perout.flags || req.perout.rsv[0]
> + || req.perout.rsv[1] || req.perout.rsv[2]
> + || req.perout.rsv[3]) {
> + err = -EINVAL;
> + break;
> + }
Also this could share code with the legacy ioctl.
> + if (req.perout.index >= ops->n_per_out) {
> + err = -EINVAL;
> + break;
> + }
> + req.type = PTP_CLK_REQ_PEROUT;
> + enable = req.perout.period.sec || req.perout.period.nsec;
> + err = ops->enable(ops, &req, enable);
> + break;
> +
> case PTP_ENABLE_PPS:
> if (!capable(CAP_SYS_TIME))
> return -EPERM;
> @@ -176,7 +221,17 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
> err = ops->enable(ops, &req, enable);
> break;
>
> + case PTP_ENABLE_PPS2:
> + if (!capable(CAP_SYS_TIME))
> + return -EPERM;
> + memset(&req, 0, sizeof(req));
Clearing 'req' unconditionally will make this case the same as the
legacy case.
> + req.type = PTP_CLK_REQ_PPS;
> + enable = arg ? 1 : 0;
> + err = ops->enable(ops, &req, enable);
> + break;
> +
> case PTP_SYS_OFFSET_PRECISE:
> + case PTP_SYS_OFFSET_PRECISE2:
> if (!ptp->info->getcrosststamp) {
> err = -EOPNOTSUPP;
> break;
> @@ -201,6 +256,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
> break;
>
> case PTP_SYS_OFFSET_EXTENDED:
> + case PTP_SYS_OFFSET_EXTENDED2:
> if (!ptp->info->gettimex64) {
> err = -EOPNOTSUPP;
> break;
> @@ -232,6 +288,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
> break;
>
> case PTP_SYS_OFFSET:
> + case PTP_SYS_OFFSET2:
> sysoff = memdup_user((void __user *)arg, sizeof(*sysoff));
> if (IS_ERR(sysoff)) {
> err = PTR_ERR(sysoff);
> @@ -284,6 +341,31 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
> err = -EFAULT;
> break;
>
> + case PTP_PIN_GETFUNC2:
> + memset(&pd, 0, sizeof(pd));
This memset is pointless because of the following copy_from_user().
> + if (copy_from_user(&pd, (void __user *)arg, sizeof(pd))) {
> + err = -EFAULT;
> + break;
> + }
> + if (pd.rsv[0] || pd.rsv[1] || pd.rsv[2]
> + || pd.rsv[3] || pd.rsv[4]) {
> + err = -EINVAL;
> + break;
> + }
Again maybe share the code?
> + pin_index = pd.index;
> + if (pin_index >= ops->n_pins) {
> + err = -EINVAL;
> + break;
> + }
> + pin_index = array_index_nospec(pin_index, ops->n_pins);
> + if (mutex_lock_interruptible(&ptp->pincfg_mux))
> + return -ERESTARTSYS;
> + pd = ops->pin_config[pin_index];
> + mutex_unlock(&ptp->pincfg_mux);
> + if (!err && copy_to_user((void __user *)arg, &pd, sizeof(pd)))
> + err = -EFAULT;
> + break;
> +
> case PTP_PIN_SETFUNC:
> if (copy_from_user(&pd, (void __user *)arg, sizeof(pd))) {
> err = -EFAULT;
> @@ -301,6 +383,29 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
> mutex_unlock(&ptp->pincfg_mux);
> break;
>
> + case PTP_PIN_SETFUNC2:
> + memset(&pd, 0, sizeof(pd));
memset not needed here either.
> + if (copy_from_user(&pd, (void __user *)arg, sizeof(pd))) {
> + err = -EFAULT;
> + break;
> + }
> + if (pd.rsv[0] || pd.rsv[1] || pd.rsv[2]
> + || pd.rsv[3] || pd.rsv[4]) {
> + err = -EINVAL;
> + break;
> + }
also shareable.
Thanks,
Richard
> + pin_index = pd.index;
> + if (pin_index >= ops->n_pins) {
> + err = -EINVAL;
> + break;
> + }
> + pin_index = array_index_nospec(pin_index, ops->n_pins);
> + if (mutex_lock_interruptible(&ptp->pincfg_mux))
> + return -ERESTARTSYS;
> + err = ptp_set_pinfunc(ptp, pin_index, pd.func, pd.chan);
> + mutex_unlock(&ptp->pincfg_mux);
> + break;
> +
> default:
> err = -ENOTTY;
> break;
> diff --git a/include/uapi/linux/ptp_clock.h b/include/uapi/linux/ptp_clock.h
> index 1bc794ad957a..039cd62ec706 100644
> --- a/include/uapi/linux/ptp_clock.h
> +++ b/include/uapi/linux/ptp_clock.h
> @@ -149,6 +149,18 @@ struct ptp_pin_desc {
> #define PTP_SYS_OFFSET_EXTENDED \
> _IOWR(PTP_CLK_MAGIC, 9, struct ptp_sys_offset_extended)
>
> +#define PTP_CLOCK_GETCAPS2 _IOR(PTP_CLK_MAGIC, 10, struct ptp_clock_caps)
> +#define PTP_EXTTS_REQUEST2 _IOW(PTP_CLK_MAGIC, 11, struct ptp_extts_request)
> +#define PTP_PEROUT_REQUEST2 _IOW(PTP_CLK_MAGIC, 12, struct ptp_perout_request)
> +#define PTP_ENABLE_PPS2 _IOW(PTP_CLK_MAGIC, 13, int)
> +#define PTP_SYS_OFFSET2 _IOW(PTP_CLK_MAGIC, 14, struct ptp_sys_offset)
> +#define PTP_PIN_GETFUNC2 _IOWR(PTP_CLK_MAGIC, 15, struct ptp_pin_desc)
> +#define PTP_PIN_SETFUNC2 _IOW(PTP_CLK_MAGIC, 16, struct ptp_pin_desc)
> +#define PTP_SYS_OFFSET_PRECISE2 \
> + _IOWR(PTP_CLK_MAGIC, 17, struct ptp_sys_offset_precise)
> +#define PTP_SYS_OFFSET_EXTENDED2 \
> + _IOWR(PTP_CLK_MAGIC, 18, struct ptp_sys_offset_extended)
> +
> struct ptp_extts_event {
> struct ptp_clock_time t; /* Time event occured. */
> unsigned int index; /* Which channel produced the event. */
> --
> 2.22.0
>
>
>
> --
> balbi
^ permalink raw reply
* Re: [patch net-next v3 0/3] net: devlink: Finish network namespace support
From: Jakub Kicinski @ 2019-08-13 17:45 UTC (permalink / raw)
To: David Ahern; +Cc: Jiri Pirko, netdev, davem, stephen, mlxsw
In-Reply-To: <a9fa6f7f-7981-6077-106d-fa2abfc7397c@gmail.com>
On Mon, 12 Aug 2019 19:46:57 -0600, David Ahern wrote:
> On 8/12/19 7:11 PM, Jakub Kicinski wrote:
> > If the devlink instance just disappeared - that'd be a very very strange
> > thing. Only software objects disappear with the namespace.
> > Netdevices without ->rtnl_link_ops go back to init_net.
>
> netdevsim still has rtnl_link_ops:
>
> static struct rtnl_link_ops nsim_link_ops __read_mostly = {
> .kind = DRV_NAME,
> .validate = nsim_validate,
> };
The test harness is the only devlink instance which may
conceivably have link ops. And implementing the behaviour
you ask for would require core changes.
We are back to the precedent by test harness argument :(
^ permalink raw reply
* [PATCH net-next] page_pool: fix logic in __page_pool_get_cached
From: Jonathan Lemon @ 2019-08-13 17:45 UTC (permalink / raw)
To: netdev, davem
Cc: brouer, ilias.apalodimas, saeedm, ttoukan.linux, kernel-team
__page_pool_get_cached() will return NULL when the ring is
empty, even if there are pages present in the lookaside cache.
It is also possible to refill the cache, and then return a
NULL page.
Restructure the logic so eliminate both cases.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
net/core/page_pool.c | 39 ++++++++++++++++-----------------------
1 file changed, 16 insertions(+), 23 deletions(-)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 68510eb869ea..de09a74a39a4 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -82,12 +82,9 @@ EXPORT_SYMBOL(page_pool_create);
static struct page *__page_pool_get_cached(struct page_pool *pool)
{
struct ptr_ring *r = &pool->ring;
+ bool refill = false;
struct page *page;
- /* Quicker fallback, avoid locks when ring is empty */
- if (__ptr_ring_empty(r))
- return NULL;
-
/* Test for safe-context, caller should provide this guarantee */
if (likely(in_serving_softirq())) {
if (likely(pool->alloc.count)) {
@@ -95,27 +92,23 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
page = pool->alloc.cache[--pool->alloc.count];
return page;
}
- /* Slower-path: Alloc array empty, time to refill
- *
- * Open-coded bulk ptr_ring consumer.
- *
- * Discussion: the ring consumer lock is not really
- * needed due to the softirq/NAPI protection, but
- * later need the ability to reclaim pages on the
- * ring. Thus, keeping the locks.
- */
- spin_lock(&r->consumer_lock);
- while ((page = __ptr_ring_consume(r))) {
- if (pool->alloc.count == PP_ALLOC_CACHE_REFILL)
- break;
- pool->alloc.cache[pool->alloc.count++] = page;
- }
- spin_unlock(&r->consumer_lock);
- return page;
+ refill = true;
}
- /* Slow-path: Get page from locked ring queue */
- page = ptr_ring_consume(&pool->ring);
+ /* Quicker fallback, avoid locks when ring is empty */
+ if (__ptr_ring_empty(r))
+ return NULL;
+
+ /* Slow-path: Get page from locked ring queue,
+ * refill alloc array if requested.
+ */
+ spin_lock(&r->consumer_lock);
+ page = __ptr_ring_consume(r);
+ if (refill)
+ pool->alloc.count = __ptr_ring_consume_batched(r,
+ pool->alloc.cache,
+ PP_ALLOC_CACHE_REFILL);
+ spin_unlock(&r->consumer_lock);
return page;
}
--
2.17.1
^ permalink raw reply related
* Re: [PATCH bpf-next 2/3] xdp: xdp_umem: replace kmap on vmap for umem map
From: Jonathan Lemon @ 2019-08-13 17:42 UTC (permalink / raw)
To: Ivan Khoronzhuk
Cc: magnus.karlsson, bjorn.topel, davem, hawk, john.fastabend,
jakub.kicinski, daniel, netdev, bpf, xdp-newbies, linux-kernel
In-Reply-To: <20190813102318.5521-3-ivan.khoronzhuk@linaro.org>
On 13 Aug 2019, at 3:23, Ivan Khoronzhuk wrote:
> For 64-bit there is no reason to use vmap/vunmap, so use page_address
> as it was initially. For 32 bits, in some apps, like in samples
> xdpsock_user.c when number of pgs in use is quite big, the kmap
> memory can be not enough, despite on this, kmap looks like is
> deprecated in such cases as it can block and should be used rather
> for dynamic mm.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Seems a bit overkill - if not high memory, kmap() falls back
to just page_address(), unlike vmap().
--
Jonathan
> ---
> net/xdp/xdp_umem.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index a0607969f8c0..907c9019fe21 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -14,7 +14,7 @@
> #include <linux/netdevice.h>
> #include <linux/rtnetlink.h>
> #include <linux/idr.h>
> -#include <linux/highmem.h>
> +#include <linux/vmalloc.h>
>
> #include "xdp_umem.h"
> #include "xsk_queue.h"
> @@ -167,10 +167,12 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
>
> static void xdp_umem_unmap_pages(struct xdp_umem *umem)
> {
> +#if BITS_PER_LONG == 32
> unsigned int i;
>
> for (i = 0; i < umem->npgs; i++)
> - kunmap(umem->pgs[i]);
> + vunmap(umem->pages[i].addr);
> +#endif
> }
>
> static void xdp_umem_unpin_pages(struct xdp_umem *umem)
> @@ -378,8 +380,14 @@ static int xdp_umem_reg(struct xdp_umem *umem,
> struct xdp_umem_reg *mr)
> goto out_account;
> }
>
> - for (i = 0; i < umem->npgs; i++)
> - umem->pages[i].addr = kmap(umem->pgs[i]);
> + for (i = 0; i < umem->npgs; i++) {
> +#if BITS_PER_LONG == 32
> + umem->pages[i].addr = vmap(&umem->pgs[i], 1, VM_MAP,
> + PAGE_KERNEL);
> +#else
> + umem->pages[i].addr = page_address(umem->pgs[i]);
> +#endif
> + }
>
> return 0;
>
> --
> 2.17.1
^ permalink raw reply
* Re: [PATCH bpf-next 3/3] samples: bpf: syscal_nrs: use mmap2 if defined
From: Jonathan Lemon @ 2019-08-13 17:41 UTC (permalink / raw)
To: Ivan Khoronzhuk
Cc: magnus.karlsson, bjorn.topel, davem, hawk, john.fastabend,
jakub.kicinski, daniel, netdev, bpf, xdp-newbies, linux-kernel
In-Reply-To: <20190813102318.5521-4-ivan.khoronzhuk@linaro.org>
On 13 Aug 2019, at 3:23, Ivan Khoronzhuk wrote:
> For arm32 xdp sockets mmap2 is preferred, so use it if it's defined.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Doesn't this change the application API?
--
Jonathan
> ---
> samples/bpf/syscall_nrs.c | 5 +++++
> samples/bpf/tracex5_kern.c | 11 +++++++++++
> 2 files changed, 16 insertions(+)
>
> diff --git a/samples/bpf/syscall_nrs.c b/samples/bpf/syscall_nrs.c
> index 516e255cbe8f..2dec94238350 100644
> --- a/samples/bpf/syscall_nrs.c
> +++ b/samples/bpf/syscall_nrs.c
> @@ -9,5 +9,10 @@ void syscall_defines(void)
> COMMENT("Linux system call numbers.");
> SYSNR(__NR_write);
> SYSNR(__NR_read);
> +#ifdef __NR_mmap2
> + SYSNR(__NR_mmap2);
> +#else
> SYSNR(__NR_mmap);
> +#endif
> +
> }
> diff --git a/samples/bpf/tracex5_kern.c b/samples/bpf/tracex5_kern.c
> index f57f4e1ea1ec..300350ad299a 100644
> --- a/samples/bpf/tracex5_kern.c
> +++ b/samples/bpf/tracex5_kern.c
> @@ -68,12 +68,23 @@ PROG(SYS__NR_read)(struct pt_regs *ctx)
> return 0;
> }
>
> +#ifdef __NR_mmap2
> +PROG(SYS__NR_mmap2)(struct pt_regs *ctx)
> +{
> + char fmt[] = "mmap2\n";
> +
> + bpf_trace_printk(fmt, sizeof(fmt));
> + return 0;
> +}
> +#else
> PROG(SYS__NR_mmap)(struct pt_regs *ctx)
> {
> char fmt[] = "mmap\n";
> +
> bpf_trace_printk(fmt, sizeof(fmt));
> return 0;
> }
> +#endif
>
> char _license[] SEC("license") = "GPL";
> u32 _version SEC("version") = LINUX_VERSION_CODE;
> --
> 2.17.1
^ permalink raw reply
* Re: [PATCH net] netdevsim: Restore per-network namespace accounting for fib entries
From: David Miller @ 2019-08-13 17:40 UTC (permalink / raw)
To: jiri; +Cc: dsahern, netdev, dsahern
In-Reply-To: <20190813071445.GL2428@nanopsycho>
From: Jiri Pirko <jiri@resnulli.us>
Date: Tue, 13 Aug 2019 09:14:45 +0200
> Mon, Aug 12, 2019 at 05:28:02PM CEST, davem@davemloft.net wrote:
>>From: Jiri Pirko <jiri@resnulli.us>
>>Date: Mon, 12 Aug 2019 10:36:35 +0200
>>
>>> I understand it with real devices, but dummy testing device, who's
>>> purpose is just to test API. Why?
>>
>>Because you'll break all of the wonderful testing infrastructure
>>people like David have created.
>
> Are you referring to selftests? There is no such test there :(
> But if it would be, could implement the limitations
> properly (like using cgroups), change the tests and remove this
> code from netdevsim?
What about older kernels? Can't you see how illogical this is?
^ permalink raw reply
* Re: [PATCH bpf-next 1/3] libbpf: add asm/unistd.h to xsk to get __NR_mmap2
From: Jonathan Lemon @ 2019-08-13 17:36 UTC (permalink / raw)
To: Ivan Khoronzhuk
Cc: magnus.karlsson, bjorn.topel, davem, hawk, john.fastabend,
jakub.kicinski, daniel, netdev, bpf, xdp-newbies, linux-kernel
In-Reply-To: <20190813102318.5521-2-ivan.khoronzhuk@linaro.org>
On 13 Aug 2019, at 3:23, Ivan Khoronzhuk wrote:
> That's needed to get __NR_mmap2 when mmap2 syscall is used.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
^ permalink raw reply
* Re: general protection fault in tls_write_space
From: Jakub Kicinski @ 2019-08-13 17:27 UTC (permalink / raw)
To: John Fastabend
Cc: Hillf Danton, syzbot, aviadye, borisp, daniel, davejwatson, davem,
linux-kernel, netdev, oss-drivers, syzkaller-bugs, willemb
In-Reply-To: <5d52f09299e91_40c72adb748b25c0d3@john-XPS-13-9370.notmuch>
On Tue, 13 Aug 2019 10:17:06 -0700, John Fastabend wrote:
> > Followup of commit 95fa145479fb
> > ("bpf: sockmap/tls, close can race with map free")
> >
> > --- a/net/tls/tls_main.c
> > +++ b/net/tls/tls_main.c
> > @@ -308,6 +308,9 @@ static void tls_sk_proto_close(struct so
> > if (free_ctx)
> > icsk->icsk_ulp_data = NULL;
> > sk->sk_prot = ctx->sk_proto;
> > + /* tls will go; restore sock callback before enabling bh */
> > + if (sk->sk_write_space == tls_write_space)
> > + sk->sk_write_space = ctx->sk_write_space;
> > write_unlock_bh(&sk->sk_callback_lock);
> > release_sock(sk);
> > if (ctx->tx_conf == TLS_SW)
>
> Hi Hillf,
>
> We need this patch (although slightly updated for bpf tree) do
> you want to send it? Otherwise I can. We should only set this if
> TX path was enabled otherwise we null it. Checking against
> tls_write_space seems best to me as well.
>
> Against bpf this patch should fix it.
>
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index ce6ef56a65ef..43252a801c3f 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -308,7 +308,8 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
> if (free_ctx)
> icsk->icsk_ulp_data = NULL;
> sk->sk_prot = ctx->sk_proto;
> - sk->sk_write_space = ctx->sk_write_space;
> + if (sk->sk_write_space == tls_write_space)
> + sk->sk_write_space = ctx->sk_write_space;
> write_unlock_bh(&sk->sk_callback_lock);
> release_sock(sk);
> if (ctx->tx_conf == TLS_SW)
This is already in net since Friday:
commit 57c722e932cfb82e9820bbaae1b1f7222ea97b52
Author: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Fri Aug 9 18:36:23 2019 -0700
net/tls: swap sk_write_space on close
Now that we swap the original proto and clear the ULP pointer
on close we have to make sure no callback will try to access
the freed state. sk_write_space is not part of sk_prot, remember
to swap it.
Reported-by: syzbot+dcdc9deefaec44785f32@syzkaller.appspotmail.com
Fixes: 95fa145479fb ("bpf: sockmap/tls, close can race with map free")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 9cbbae606ced..ce6ef56a65ef 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -308,6 +308,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
if (free_ctx)
icsk->icsk_ulp_data = NULL;
sk->sk_prot = ctx->sk_proto;
+ sk->sk_write_space = ctx->sk_write_space;
write_unlock_bh(&sk->sk_callback_lock);
release_sock(sk);
if (ctx->tx_conf == TLS_SW)
^ permalink raw reply related
* Re: general protection fault in tls_write_space
From: John Fastabend @ 2019-08-13 17:17 UTC (permalink / raw)
To: Hillf Danton, syzbot
Cc: aviadye, borisp, daniel, davejwatson, davem, jakub.kicinski,
john.fastabend, linux-kernel, netdev, oss-drivers, syzkaller-bugs,
willemb
In-Reply-To: <20190810135900.2820-1-hdanton@sina.com>
Hillf Danton wrote:
>
> On Sat, 10 Aug 2019 01:23:06 -0700
> >
> > syzbot has found a reproducer for the following crash on:
> >
> > HEAD commit: ca497fb6 taprio: remove unused variable 'entry_list_policy'
> > git tree: net-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=109f3802600000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=dcdc9deefaec44785f32
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11c78cd2600000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+dcdc9deefaec44785f32@syzkaller.appspotmail.com
> >
> > kasan: CONFIG_KASAN_INLINE enabled
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.3.0-rc3+ #125
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:tls_write_space+0x51/0x170 net/tls/tls_main.c:239
> > Code: c1 ea 03 80 3c 02 00 0f 85 26 01 00 00 49 8b 9c 24 b0 06 00 00 48 b8
> > 00 00 00 00 00 fc ff df 48 8d 7b 6a 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48
> > 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 df 00 00 00
> > RSP: 0018:ffff8880a98b74c8 EFLAGS: 00010202
> > RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff860a27a2
> > RDX: 000000000000000d RSI: ffffffff862c86c1 RDI: 000000000000006a
> > RBP: ffff8880a98b74e0 R08: ffff8880a98a2240 R09: fffffbfff167c289
> > R10: fffffbfff167c288 R11: ffffffff8b3e1447 R12: ffff8880a4de41c0
> > R13: ffff8880a4de45b8 R14: 000000000000000a R15: 0000000000000000
> > FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000000 CR3: 000000008c9d1000 CR4: 00000000001406f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > tcp_new_space net/ipv4/tcp_input.c:5151 [inline]
> > tcp_check_space+0x191/0x760 net/ipv4/tcp_input.c:5162
> > tcp_data_snd_check net/ipv4/tcp_input.c:5172 [inline]
> > tcp_rcv_state_process+0xe24/0x4e48 net/ipv4/tcp_input.c:6303
> > tcp_v6_do_rcv+0x7d7/0x12c0 net/ipv6/tcp_ipv6.c:1381
> > tcp_v6_rcv+0x31f1/0x3500 net/ipv6/tcp_ipv6.c:1588
> > ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
> > ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
> > NF_HOOK include/linux/netfilter.h:305 [inline]
> > NF_HOOK include/linux/netfilter.h:299 [inline]
> > ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
> > dst_input include/net/dst.h:442 [inline]
> > ip6_rcv_finish+0x1de/0x2f0 net/ipv6/ip6_input.c:76
> > NF_HOOK include/linux/netfilter.h:305 [inline]
> > NF_HOOK include/linux/netfilter.h:299 [inline]
> > ipv6_rcv+0x10e/0x420 net/ipv6/ip6_input.c:272
> > __netif_receive_skb_one_core+0x113/0x1a0 net/core/dev.c:5006
> > __netif_receive_skb+0x2c/0x1d0 net/core/dev.c:5120
> > process_backlog+0x206/0x750 net/core/dev.c:5951
> > napi_poll net/core/dev.c:6388 [inline]
> > net_rx_action+0x4d6/0x1080 net/core/dev.c:6456
> > __do_softirq+0x262/0x98c kernel/softirq.c:292
> > run_ksoftirqd kernel/softirq.c:603 [inline]
> > run_ksoftirqd+0x8e/0x110 kernel/softirq.c:595
> > smpboot_thread_fn+0x6a3/0xa40 kernel/smpboot.c:165
> > kthread+0x361/0x430 kernel/kthread.c:255
> > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> > Modules linked in:
> > ---[ end trace c21a83505707bb9d ]---
>
> Followup of commit 95fa145479fb
> ("bpf: sockmap/tls, close can race with map free")
>
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -308,6 +308,9 @@ static void tls_sk_proto_close(struct so
> if (free_ctx)
> icsk->icsk_ulp_data = NULL;
> sk->sk_prot = ctx->sk_proto;
> + /* tls will go; restore sock callback before enabling bh */
> + if (sk->sk_write_space == tls_write_space)
> + sk->sk_write_space = ctx->sk_write_space;
> write_unlock_bh(&sk->sk_callback_lock);
> release_sock(sk);
> if (ctx->tx_conf == TLS_SW)
Hi Hillf,
We need this patch (although slightly updated for bpf tree) do
you want to send it? Otherwise I can. We should only set this if
TX path was enabled otherwise we null it. Checking against
tls_write_space seems best to me as well.
Against bpf this patch should fix it.
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index ce6ef56a65ef..43252a801c3f 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -308,7 +308,8 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
if (free_ctx)
icsk->icsk_ulp_data = NULL;
sk->sk_prot = ctx->sk_proto;
- sk->sk_write_space = ctx->sk_write_space;
+ if (sk->sk_write_space == tls_write_space)
+ sk->sk_write_space = ctx->sk_write_space;
write_unlock_bh(&sk->sk_callback_lock);
release_sock(sk);
if (ctx->tx_conf == TLS_SW)
^ permalink raw reply related
* Re: [PATCH bpf-next v3 2/4] bpf: support cloning sk storage on accept()
From: Stanislav Fomichev @ 2019-08-13 17:16 UTC (permalink / raw)
To: Yonghong Song
Cc: Stanislav Fomichev, netdev@vger.kernel.org, bpf@vger.kernel.org,
davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
Martin Lau
In-Reply-To: <173b3736-97af-cad5-9432-8e6422a89d05@fb.com>
On 08/13, Yonghong Song wrote:
>
>
> On 8/13/19 9:26 AM, Stanislav Fomichev wrote:
> > Add new helper bpf_sk_storage_clone which optionally clones sk storage
> > and call it from sk_clone_lock.
> >
> > Cc: Martin KaFai Lau <kafai@fb.com>
> > Cc: Yonghong Song <yhs@fb.com>
> > Acked-by: Yonghong Song <yhs@fb.com>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> > include/net/bpf_sk_storage.h | 10 ++++
> > include/uapi/linux/bpf.h | 3 +
> > net/core/bpf_sk_storage.c | 103 ++++++++++++++++++++++++++++++++++-
> > net/core/sock.c | 9 ++-
> > 4 files changed, 119 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
> > index b9dcb02e756b..8e4f831d2e52 100644
> > --- a/include/net/bpf_sk_storage.h
> > +++ b/include/net/bpf_sk_storage.h
> > @@ -10,4 +10,14 @@ void bpf_sk_storage_free(struct sock *sk);
> > extern const struct bpf_func_proto bpf_sk_storage_get_proto;
> > extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
> >
> > +#ifdef CONFIG_BPF_SYSCALL
> > +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
> > +#else
> > +static inline int bpf_sk_storage_clone(const struct sock *sk,
> > + struct sock *newsk)
> > +{
> > + return 0;
> > +}
> > +#endif
> > +
> > #endif /* _BPF_SK_STORAGE_H */
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 4393bd4b2419..0ef594ac3899 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -337,6 +337,9 @@ enum bpf_attach_type {
> > #define BPF_F_RDONLY_PROG (1U << 7)
> > #define BPF_F_WRONLY_PROG (1U << 8)
> >
> > +/* Clone map from listener for newly accepted socket */
> > +#define BPF_F_CLONE (1U << 9)
> > +
> > /* flags for BPF_PROG_QUERY */
> > #define BPF_F_QUERY_EFFECTIVE (1U << 0)
> >
> > diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> > index 94c7f77ecb6b..1bc7de7e18ba 100644
> > --- a/net/core/bpf_sk_storage.c
> > +++ b/net/core/bpf_sk_storage.c
> > @@ -12,6 +12,9 @@
> >
> > static atomic_t cache_idx;
> >
> > +#define SK_STORAGE_CREATE_FLAG_MASK \
> > + (BPF_F_NO_PREALLOC | BPF_F_CLONE)
> > +
> > struct bucket {
> > struct hlist_head list;
> > raw_spinlock_t lock;
> > @@ -209,7 +212,6 @@ static void selem_unlink_sk(struct bpf_sk_storage_elem *selem)
> > kfree_rcu(sk_storage, rcu);
> > }
> >
> > -/* sk_storage->lock must be held and sk_storage->list cannot be empty */
> > static void __selem_link_sk(struct bpf_sk_storage *sk_storage,
> > struct bpf_sk_storage_elem *selem)
> > {
> > @@ -509,7 +511,7 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
> > return 0;
> > }
> >
> > -/* Called by __sk_destruct() */
> > +/* Called by __sk_destruct() & bpf_sk_storage_clone() */
> > void bpf_sk_storage_free(struct sock *sk)
> > {
> > struct bpf_sk_storage_elem *selem;
> > @@ -557,6 +559,11 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
> >
> > smap = (struct bpf_sk_storage_map *)map;
> >
> > + /* Note that this map might be concurrently cloned from
> > + * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
> > + * RCU read section to finish before proceeding. New RCU
> > + * read sections should be prevented via bpf_map_inc_not_zero.
> > + */
> > synchronize_rcu();
> >
> > /* bpf prog and the userspace can no longer access this map
> > @@ -601,7 +608,9 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
> >
> > static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
> > {
> > - if (attr->map_flags != BPF_F_NO_PREALLOC || attr->max_entries ||
> > + if (attr->map_flags & ~SK_STORAGE_CREATE_FLAG_MASK ||
> > + !(attr->map_flags & BPF_F_NO_PREALLOC) ||
> > + attr->max_entries ||
> > attr->key_size != sizeof(int) || !attr->value_size ||
> > /* Enforce BTF for userspace sk dumping */
> > !attr->btf_key_type_id || !attr->btf_value_type_id)
> > @@ -739,6 +748,94 @@ static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key)
> > return err;
> > }
> >
> > +static struct bpf_sk_storage_elem *
> > +bpf_sk_storage_clone_elem(struct sock *newsk,
> > + struct bpf_sk_storage_map *smap,
> > + struct bpf_sk_storage_elem *selem)
> > +{
> > + struct bpf_sk_storage_elem *copy_selem;
> > +
> > + copy_selem = selem_alloc(smap, newsk, NULL, true);
> > + if (!copy_selem)
> > + return NULL;
> > +
> > + if (map_value_has_spin_lock(&smap->map))
> > + copy_map_value_locked(&smap->map, SDATA(copy_selem)->data,
> > + SDATA(selem)->data, true);
> > + else
> > + copy_map_value(&smap->map, SDATA(copy_selem)->data,
> > + SDATA(selem)->data);
> > +
> > + return copy_selem;
> > +}
> > +
> > +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
> > +{
> > + struct bpf_sk_storage *new_sk_storage = NULL;
> > + struct bpf_sk_storage *sk_storage;
> > + struct bpf_sk_storage_elem *selem;
> > + int ret;
> > +
> > + RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> > +
> > + rcu_read_lock();
> > + sk_storage = rcu_dereference(sk->sk_bpf_storage);
> > +
> > + if (!sk_storage || hlist_empty(&sk_storage->list))
> > + goto out;
> > +
> > + hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
> > + struct bpf_sk_storage_elem *copy_selem;
> > + struct bpf_sk_storage_map *smap;
> > + struct bpf_map *map;
> > +
> > + smap = rcu_dereference(SDATA(selem)->smap);
> > + if (!(smap->map.map_flags & BPF_F_CLONE))
> > + continue;
> > +
> > + map = bpf_map_inc_not_zero(&smap->map, false);
> > + if (IS_ERR(map))
> > + continue;
> > +
> > + copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
> > + if (!copy_selem) {
> > + ret = -ENOMEM;
> > + bpf_map_put(map);
> > + goto err;
> > + }
> > +
> > + if (new_sk_storage) {
> > + selem_link_map(smap, copy_selem);
> > + __selem_link_sk(new_sk_storage, copy_selem);
> > + } else {
> > + ret = sk_storage_alloc(newsk, smap, copy_selem);
> > + if (ret) {
> > + kfree(copy_selem);
> > + atomic_sub(smap->elem_size,
> > + &newsk->sk_omem_alloc);
> > + bpf_map_put(map);
> > + goto err;
> > + }
> > +
> > + new_sk_storage = rcu_dereference(copy_selem->sk_storage);
> > + }
> > + bpf_map_put(map);
> > + }
> > +
> > +out:
> > + rcu_read_unlock();
> > + return 0;
> > +
> > +err:
> > + rcu_read_unlock();
> > +
> > + /* Don't free anything explicitly here, caller is responsible to
> > + * call bpf_sk_storage_free in case of an error.
> > + */
> > +
> > + return ret;
>
> A nit.
> If you set ret = 0 initially, you do not need the above two
> rcu_read_unlock(). One "return ret" should be enough.
> The comment can be changed to
> /* In case of an error, ... */
I thought about it, but decided to keep them separate because of
this comment. OTOH, adding "In case of an error," might help with
that. Can do another respin once Martin gets to review this
version (don't want to spam).
> > +}
> > +
> > BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
> > void *, value, u64, flags)
> > {
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index d57b0cc995a0..f5e801a9cea4 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -1851,9 +1851,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
> > goto out;
> > }
> > RCU_INIT_POINTER(newsk->sk_reuseport_cb, NULL);
> > -#ifdef CONFIG_BPF_SYSCALL
> > - RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> > -#endif
> > +
> > + if (bpf_sk_storage_clone(sk, newsk)) {
> > + sk_free_unlock_clone(newsk);
> > + newsk = NULL;
> > + goto out;
> > + }
> >
> > newsk->sk_err = 0;
> > newsk->sk_err_soft = 0;
> >
^ permalink raw reply
* [PATCH net-next] net: dsa: mv88e6xxx: check for mode change in port_setup_mac
From: Marek Behún @ 2019-08-13 17:12 UTC (permalink / raw)
To: netdev; +Cc: Andrew Lunn, Vivien Didelot, Heiner Kallweit, Marek Behún
The mv88e6xxx_port_setup_mac checks if the requested MAC settings are
different from the current ones, and if not, does nothing (since chaning
them requires putting the link down).
In this check it only looks if the triplet [link, speed, duplex] is
being changed.
This patch adds support to also check if the mode parameter (of type
phy_interface_t) is requested to be changed. The current mode is
computed by the ->port_link_state() method, and if it is different from
PHY_INTERFACE_MODE_NA, we check for equality with the requested mode.
In the implementations of the mv88e6250_port_link_state() method we set
the current mode to PHY_INTERFACE_MODE_NA - so the code does not check
for mode change on 6250.
In the mv88e6352_port_link_state() method, we use the cached cmode of
the port to determine the mode as phy_interface_t (and if it is not
enough, eg. for RGMII, we also look at the port control register for
RX/TX timings).
Signed-off-by: Marek Behún <marek.behun@nic.cz>
---
drivers/net/dsa/mv88e6xxx/chip.c | 4 +++-
drivers/net/dsa/mv88e6xxx/port.c | 40 +++++++++++++++++++++++++++++++-
drivers/net/dsa/mv88e6xxx/port.h | 1 +
3 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 818a83eb2dcb..9b3ad22a5b98 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -417,7 +417,9 @@ int mv88e6xxx_port_setup_mac(struct mv88e6xxx_chip *chip, int port, int link,
*/
if (state.link == link &&
state.speed == speed &&
- state.duplex == duplex)
+ state.duplex == duplex &&
+ (state.interface == mode ||
+ state.interface == PHY_INTERFACE_MODE_NA))
return 0;
/* Port's MAC control must not be changed unless the link is down */
diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
index 04309ef0a1cc..304e5b118b08 100644
--- a/drivers/net/dsa/mv88e6xxx/port.c
+++ b/drivers/net/dsa/mv88e6xxx/port.c
@@ -590,6 +590,7 @@ int mv88e6250_port_link_state(struct mv88e6xxx_chip *chip, int port,
state->link = !!(reg & MV88E6250_PORT_STS_LINK);
state->an_enabled = 1;
state->an_complete = state->link;
+ state->interface = PHY_INTERFACE_MODE_NA;
return 0;
}
@@ -598,12 +599,49 @@ int mv88e6352_port_link_state(struct mv88e6xxx_chip *chip, int port,
struct phylink_link_state *state)
{
int err;
- u16 reg;
+ u16 reg, mac;
err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_STS, ®);
if (err)
return err;
+ switch (chip->ports[port].cmode) {
+ case MV88E6XXX_PORT_STS_CMODE_RGMII:
+ err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_MAC_CTL,
+ &mac);
+ if (err)
+ return err;
+
+ if ((mac & MV88E6XXX_PORT_MAC_CTL_RGMII_DELAY_RXCLK) &&
+ (mac & MV88E6XXX_PORT_MAC_CTL_RGMII_DELAY_TXCLK))
+ state->interface = PHY_INTERFACE_MODE_RGMII_ID;
+ else if (mac & MV88E6XXX_PORT_MAC_CTL_RGMII_DELAY_RXCLK)
+ state->interface = PHY_INTERFACE_MODE_RGMII_RXID;
+ else if (mac & MV88E6XXX_PORT_MAC_CTL_RGMII_DELAY_TXCLK)
+ state->interface = PHY_INTERFACE_MODE_RGMII_TXID;
+ else
+ state->interface = PHY_INTERFACE_MODE_RGMII;
+ break;
+ case MV88E6XXX_PORT_STS_CMODE_1000BASE_X:
+ state->interface = PHY_INTERFACE_MODE_1000BASEX;
+ break;
+ case MV88E6XXX_PORT_STS_CMODE_SGMII:
+ state->interface = PHY_INTERFACE_MODE_SGMII;
+ break;
+ case MV88E6XXX_PORT_STS_CMODE_2500BASEX:
+ state->interface = PHY_INTERFACE_MODE_2500BASEX;
+ break;
+ case MV88E6XXX_PORT_STS_CMODE_XAUI:
+ state->interface = PHY_INTERFACE_MODE_XAUI;
+ break;
+ case MV88E6XXX_PORT_STS_CMODE_RXAUI:
+ state->interface = PHY_INTERFACE_MODE_RXAUI;
+ break;
+ default:
+ /* we do not support other cmode values here */
+ state->interface = PHY_INTERFACE_MODE_NA;
+ }
+
switch (reg & MV88E6XXX_PORT_STS_SPEED_MASK) {
case MV88E6XXX_PORT_STS_SPEED_10:
state->speed = SPEED_10;
diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
index ceec771f8bfc..1abf5ea033e2 100644
--- a/drivers/net/dsa/mv88e6xxx/port.h
+++ b/drivers/net/dsa/mv88e6xxx/port.h
@@ -42,6 +42,7 @@
#define MV88E6XXX_PORT_STS_TX_PAUSED 0x0020
#define MV88E6XXX_PORT_STS_FLOW_CTL 0x0010
#define MV88E6XXX_PORT_STS_CMODE_MASK 0x000f
+#define MV88E6XXX_PORT_STS_CMODE_RGMII 0x0007
#define MV88E6XXX_PORT_STS_CMODE_100BASE_X 0x0008
#define MV88E6XXX_PORT_STS_CMODE_1000BASE_X 0x0009
#define MV88E6XXX_PORT_STS_CMODE_SGMII 0x000a
--
2.21.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox