* [PATCH net-next 6/9] selftests: pmtu: Add pmtu_vti4_exception test
From: Stefano Brivio @ 2018-03-15 16:18 UTC (permalink / raw)
To: David S . Miller; +Cc: Sabrina Dubroca, Steffen Klassert, netdev
In-Reply-To: <cover.1521129192.git.sbrivio@redhat.com>
This test checks that PMTU exceptions are created only when
needed on IPv4 routes with vti and xfrm, and their PMTU value is
checked as well.
We can't adopt the same approach as test_pmtu_vti6_exception()
here, because on IPv4 administrative MTU changes won't be
reflected directly on PMTU.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
tools/testing/selftests/net/pmtu.sh | 84 ++++++++++++++++++++++++++++++++-----
1 file changed, 74 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 67b77f9108ee..336b8545c4bd 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -6,6 +6,14 @@
#
# Tests currently implemented:
#
+# - pmtu_vti4_exception
+# Set up vti tunnel on top of veth, with xfrm states and policies, in two
+# namespaces with matching endpoints. Check that route exception is not
+# created if link layer MTU is not exceeded, then exceed it and check that
+# exception is created with the expected PMTU. The approach described
+# below for IPv6 doesn't apply here, because, on IPv4, administrative MTU
+# changes alone won't affect PMTU
+#
# - pmtu_vti6_exception
# Set up vti6 tunnel on top of veth, with xfrm states and policies, in two
# namespaces with matching endpoints. Check that route exception is
@@ -22,7 +30,8 @@
# - pmtu_vti6_default_mtu
# Same as above, for IPv6
-tests="pmtu_vti6_exception pmtu_vti4_default_mtu pmtu_vti6_default_mtu"
+tests="pmtu_vti4_exception pmtu_vti6_exception
+ pmtu_vti4_default_mtu pmtu_vti6_default_mtu"
NS_A="ns-$(mktemp -u XXXXXX)"
NS_B="ns-$(mktemp -u XXXXXX)"
@@ -103,19 +112,33 @@ setup_vti6() {
}
setup_xfrm() {
- ${ns_a} ip -6 xfrm state add src ${veth6_a_addr} dst ${veth6_b_addr} spi 0x1000 proto esp aead "rfc4106(gcm(aes))" 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel || return 0
- ${ns_a} ip -6 xfrm state add src ${veth6_b_addr} dst ${veth6_a_addr} spi 0x1001 proto esp aead "rfc4106(gcm(aes))" 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel
- ${ns_a} ip -6 xfrm policy add dir out mark 10 tmpl src ${veth6_a_addr} dst ${veth6_b_addr} proto esp mode tunnel
- ${ns_a} ip -6 xfrm policy add dir in mark 10 tmpl src ${veth6_b_addr} dst ${veth6_a_addr} proto esp mode tunnel
+ proto=${1}
+ veth_a_addr="${2}"
+ veth_b_addr="${3}"
- ${ns_b} ip -6 xfrm state add src ${veth6_a_addr} dst ${veth6_b_addr} spi 0x1000 proto esp aead "rfc4106(gcm(aes))" 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel
- ${ns_b} ip -6 xfrm state add src ${veth6_b_addr} dst ${veth6_a_addr} spi 0x1001 proto esp aead "rfc4106(gcm(aes))" 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel
- ${ns_b} ip -6 xfrm policy add dir out mark 10 tmpl src ${veth6_b_addr} dst ${veth6_a_addr} proto esp mode tunnel
- ${ns_b} ip -6 xfrm policy add dir in mark 10 tmpl src ${veth6_a_addr} dst ${veth6_b_addr} proto esp mode tunnel
+ ${ns_a} ip -${proto} xfrm state add src ${veth_a_addr} dst ${veth_b_addr} spi 0x1000 proto esp aead "rfc4106(gcm(aes))" 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel || return 0
+ ${ns_a} ip -${proto} xfrm state add src ${veth_b_addr} dst ${veth_a_addr} spi 0x1001 proto esp aead "rfc4106(gcm(aes))" 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel
+ ${ns_a} ip -${proto} xfrm policy add dir out mark 10 tmpl src ${veth_a_addr} dst ${veth_b_addr} proto esp mode tunnel
+ ${ns_a} ip -${proto} xfrm policy add dir in mark 10 tmpl src ${veth_b_addr} dst ${veth_a_addr} proto esp mode tunnel
+
+ ${ns_b} ip -${proto} xfrm state add src ${veth_a_addr} dst ${veth_b_addr} spi 0x1000 proto esp aead "rfc4106(gcm(aes))" 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel
+ ${ns_b} ip -${proto} xfrm state add src ${veth_b_addr} dst ${veth_a_addr} spi 0x1001 proto esp aead "rfc4106(gcm(aes))" 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f 128 mode tunnel
+ ${ns_b} ip -${proto} xfrm policy add dir out mark 10 tmpl src ${veth_b_addr} dst ${veth_a_addr} proto esp mode tunnel
+ ${ns_b} ip -${proto} xfrm policy add dir in mark 10 tmpl src ${veth_a_addr} dst ${veth_b_addr} proto esp mode tunnel
return 1
}
+setup_xfrm4() {
+ setup_xfrm 4 ${veth4_a_addr} ${veth4_b_addr}
+ return $?
+}
+
+setup_xfrm6() {
+ setup_xfrm 6 ${veth6_a_addr} ${veth6_b_addr}
+ return $?
+}
+
setup() {
[ "$(id -u)" -ne 0 ] && echo " need to run as root" && return 0
@@ -180,8 +203,49 @@ route_get_dst_pmtu_from_exception() {
mtu_parse "$(route_get_dst_exception "${ns_cmd}" ${dst})"
}
+test_pmtu_vti4_exception() {
+ setup namespaces veth vti4 xfrm4 && return 2
+
+ veth_mtu=1500
+ vti_mtu=$((veth_mtu - 20))
+
+ # SPI SN IV ICV pad length next header
+ esp_payload_rfc4106=$((vti_mtu - 4 - 4 - 8 - 16 - 1 - 1))
+ ping_payload=$((esp_payload_rfc4106 - 28))
+
+ mtu "${ns_a}" veth_a ${veth_mtu}
+ mtu "${ns_b}" veth_b ${veth_mtu}
+ mtu "${ns_a}" vti4_a ${vti_mtu}
+ mtu "${ns_b}" vti4_b ${vti_mtu}
+
+ # Send DF packet without exceeding link layer MTU, check that no
+ # exception is created
+ ${ns_a} ping -q -M want -i 0.1 -w 2 -s ${ping_payload} ${vti4_b_addr} > /dev/null
+ pmtu="$(route_get_dst_pmtu_from_exception "${ns_a}" ${vti4_b_addr})"
+ if [ "${pmtu}" != "" ]; then
+ echo " unexpected exception created with PMTU ${pmtu} for IP payload length ${esp_payload_rfc4106}"
+ return 0
+ fi
+
+ # Now exceed link layer MTU by one byte, check that exception is created
+ ${ns_a} ping -q -M want -i 0.1 -w 2 -s $((ping_payload + 1)) ${vti4_b_addr} > /dev/null
+ pmtu="$(route_get_dst_pmtu_from_exception "${ns_a}" ${vti4_b_addr})"
+ if [ "${pmtu}" = "" ]; then
+ echo " exception not created for IP payload length $((esp_payload_rfc4106 + 1))"
+ return 0
+ fi
+
+ # ...with the right PMTU value
+ if [ ${pmtu} -ne ${esp_payload_rfc4106} ]; then
+ echo " wrong PMTU ${pmtu} in exception, expected: ${esp_payload_rfc4106}"
+ return 0
+ fi
+
+ return 1
+}
+
test_pmtu_vti6_exception() {
- setup namespaces veth vti6 xfrm && return 2
+ setup namespaces veth vti6 xfrm6 && return 2
# Create route exception by exceeding link layer MTU
mtu "${ns_a}" veth_a 4000
--
2.15.1
^ permalink raw reply related
* [PATCH net-next 7/9] selftests: pmtu: Add pmtu_vti4_link_add_mtu test
From: Stefano Brivio @ 2018-03-15 16:18 UTC (permalink / raw)
To: David S . Miller; +Cc: Sabrina Dubroca, Steffen Klassert, netdev
In-Reply-To: <cover.1521129192.git.sbrivio@redhat.com>
This test checks that MTU given on vti link creation is actually
configured, and that tunnel is not created with an invalid MTU.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
tools/testing/selftests/net/pmtu.sh | 45 ++++++++++++++++++++++++++++++++++++-
1 file changed, 44 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 336b8545c4bd..2a7ada49d0c0 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -29,9 +29,14 @@
#
# - pmtu_vti6_default_mtu
# Same as above, for IPv6
+#
+# - pmtu_vti4_link_add_mtu
+# Set up vti4 interface passing MTU value at link creation, check MTU is
+# configured, and that link is not created with invalid MTU values
tests="pmtu_vti4_exception pmtu_vti6_exception
- pmtu_vti4_default_mtu pmtu_vti6_default_mtu"
+ pmtu_vti4_default_mtu pmtu_vti6_default_mtu
+ pmtu_vti4_link_add_mtu"
NS_A="ns-$(mktemp -u XXXXXX)"
NS_B="ns-$(mktemp -u XXXXXX)"
@@ -306,6 +311,44 @@ test_pmtu_vti6_default_mtu() {
return 1
}
+test_pmtu_vti4_link_add_mtu() {
+ setup namespaces && return 2
+
+ ${ns_a} ip link add vti4_a type vti local ${veth4_a_addr} remote ${veth4_b_addr} key 10
+ [ $? -ne 0 ] && echo " vti not supported" && return 2
+ ${ns_a} ip link del vti4_a
+
+ pass=1
+
+ min=68
+ max=$((65528 - 20))
+ # Check invalid values first
+ for v in $((min - 1)) $((max + 1)); do
+ ${ns_a} ip link add vti4_a mtu ${v} type vti local ${veth4_a_addr} remote ${veth4_b_addr} key 10 2>/dev/null
+ # This can fail, or MTU can be adjusted to a proper value
+ [ $? -ne 0 ] && continue
+ mtu="$(link_get_mtu "${ns_a}" vti4_a)"
+ if [ ${mtu} -lt ${min} -o ${mtu} -gt ${max} ]; then
+ echo " vti tunnel created with invalid MTU ${mtu}"
+ pass=0
+ fi
+ ${ns_a} ip link del vti4_a
+ done
+
+ # Now check valid values
+ for v in ${min} 1300 ${max}; do
+ ${ns_a} ip link add vti4_a mtu ${v} type vti local ${veth4_a_addr} remote ${veth4_b_addr} key 10
+ mtu="$(link_get_mtu "${ns_a}" vti4_a)"
+ ${ns_a} ip link del vti4_a
+ if [ "${mtu}" != "${v}" ]; then
+ echo " vti MTU ${mtu} doesn't match configured value ${v}"
+ pass=0
+ fi
+ done
+
+ return ${pass}
+}
+
trap cleanup EXIT
exitcode=0
--
2.15.1
^ permalink raw reply related
* [PATCH net-next 8/9] selftests: pmtu: Add pmtu_vti6_link_add_mtu test
From: Stefano Brivio @ 2018-03-15 16:18 UTC (permalink / raw)
To: David S . Miller; +Cc: Sabrina Dubroca, Steffen Klassert, netdev
In-Reply-To: <cover.1521129192.git.sbrivio@redhat.com>
Same as pmtu_vti4_link_add_mtu test, but for IPv6.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
tools/testing/selftests/net/pmtu.sh | 43 ++++++++++++++++++++++++++++++++++++-
1 file changed, 42 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 2a7ada49d0c0..aad9f880c8ee 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -33,10 +33,13 @@
# - pmtu_vti4_link_add_mtu
# Set up vti4 interface passing MTU value at link creation, check MTU is
# configured, and that link is not created with invalid MTU values
+#
+# - pmtu_vti6_link_add_mtu
+# Same as above, for IPv6
tests="pmtu_vti4_exception pmtu_vti6_exception
pmtu_vti4_default_mtu pmtu_vti6_default_mtu
- pmtu_vti4_link_add_mtu"
+ pmtu_vti4_link_add_mtu pmtu_vti6_link_add_mtu"
NS_A="ns-$(mktemp -u XXXXXX)"
NS_B="ns-$(mktemp -u XXXXXX)"
@@ -349,6 +352,44 @@ test_pmtu_vti4_link_add_mtu() {
return ${pass}
}
+test_pmtu_vti6_link_add_mtu() {
+ setup namespaces && return 2
+
+ ${ns_a} ip link add vti6_a type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10
+ [ $? -ne 0 ] && echo " vti6 not supported" && return 2
+ ${ns_a} ip link del vti6_a
+
+ pass=1
+
+ min=1280
+ max=$((65535 - 40))
+ # Check invalid values first
+ for v in $((min - 1)) $((max + 1)); do
+ ${ns_a} ip link add vti6_a mtu ${v} type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10 2>/dev/null
+ # This can fail, or MTU can be adjusted to a proper value
+ [ $? -ne 0 ] && continue
+ mtu="$(link_get_mtu "${ns_a}" vti6_a)"
+ if [ ${mtu} -lt ${min} -o ${mtu} -gt ${max} ]; then
+ echo " vti6 tunnel created with invalid MTU ${v}"
+ pass=0
+ fi
+ ${ns_a} ip link del vti6_a
+ done
+
+ # Now check valid values
+ for v in 1280 1300 $((65535 - 40)); do
+ ${ns_a} ip link add vti6_a mtu ${v} type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10
+ mtu="$(link_get_mtu "${ns_a}" vti6_a)"
+ ${ns_a} ip link del vti6_a
+ if [ "${mtu}" != "${v}" ]; then
+ echo " vti6 MTU ${mtu} doesn't match configured value ${v}"
+ pass=0
+ fi
+ done
+
+ return ${pass}
+}
+
trap cleanup EXIT
exitcode=0
--
2.15.1
^ permalink raw reply related
* [PATCH net-next 9/9] selftests: pmtu: Add pmtu_vti6_link_change_mtu test
From: Stefano Brivio @ 2018-03-15 16:18 UTC (permalink / raw)
To: David S . Miller; +Cc: Sabrina Dubroca, Steffen Klassert, netdev
In-Reply-To: <cover.1521129192.git.sbrivio@redhat.com>
This test checks that MTU configured from userspace is used on
link creation and changes, and that when it's not passed from
userspace, it's calculated properly from the MTU of the lower
layer.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
tools/testing/selftests/net/pmtu.sh | 58 ++++++++++++++++++++++++++++++++++++-
1 file changed, 57 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index aad9f880c8ee..937cabe5b969 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -36,10 +36,17 @@
#
# - pmtu_vti6_link_add_mtu
# Same as above, for IPv6
+#
+# - pmtu_vti6_link_change_mtu
+# Set up two dummy interfaces with different MTUs, create a vti6 tunnel
+# and check that configured MTU is used on link creation and changes, and
+# that MTU is properly calculated instead when MTU is not configured from
+# userspace
tests="pmtu_vti4_exception pmtu_vti6_exception
pmtu_vti4_default_mtu pmtu_vti6_default_mtu
- pmtu_vti4_link_add_mtu pmtu_vti6_link_add_mtu"
+ pmtu_vti4_link_add_mtu pmtu_vti6_link_add_mtu
+ pmtu_vti6_link_change_mtu"
NS_A="ns-$(mktemp -u XXXXXX)"
NS_B="ns-$(mktemp -u XXXXXX)"
@@ -60,6 +67,10 @@ vti6_a_addr="fd00:2::a"
vti6_b_addr="fd00:2::b"
vti6_mask="64"
+dummy6_0_addr="fc00:1000::0"
+dummy6_1_addr="fc00:1001::0"
+dummy6_mask="64"
+
cleanup_done=1
setup_namespaces() {
@@ -390,6 +401,51 @@ test_pmtu_vti6_link_add_mtu() {
return ${pass}
}
+test_pmtu_vti6_link_change_mtu() {
+ setup namespaces && return 2
+
+ ${ns_a} ip link add dummy0 mtu 1500 type dummy
+ [ $? -ne 0 ] && echo " dummy not supported" && return 2
+ ${ns_a} ip link add dummy1 mtu 3000 type dummy
+ ${ns_a} ip link set dummy0 up
+ ${ns_a} ip link set dummy1 up
+
+ ${ns_a} ip addr add ${dummy6_0_addr}/${dummy6_mask} dev dummy0
+ ${ns_a} ip addr add ${dummy6_1_addr}/${dummy6_mask} dev dummy1
+
+ pass=1
+
+ # Create vti6 interface bound to device, passing MTU, check it
+ echo "${ns_a} ip link add vti6_a mtu 1300 type vti6 remote ${dummy6_0_addr} local ${dummy6_0_addr}" > /dev/kmsg
+ ${ns_a} ip link add vti6_a mtu 1300 type vti6 remote ${dummy6_0_addr} local ${dummy6_0_addr}
+ mtu="$(link_get_mtu "${ns_a}" vti6_a)"
+ if [ ${mtu} -ne 1300 ]; then
+ echo " vti6 MTU ${mtu} doesn't match configured value 1300"
+ pass=0
+ fi
+
+ # Move to another device with different MTU, without passing MTU, check
+ # MTU is adjusted
+ echo "${ns_a} ip link set vti6_a type vti6 remote ${dummy6_1_addr} local ${dummy6_1_addr}" > /dev/kmsg
+ ${ns_a} ip link set vti6_a type vti6 remote ${dummy6_1_addr} local ${dummy6_1_addr}
+ mtu="$(link_get_mtu "${ns_a}" vti6_a)"
+ if [ ${mtu} -ne $((3000 - 40)) ]; then
+ echo " vti MTU ${mtu} is not dummy MTU 3000 minus IPv6 header length"
+ pass=0
+ fi
+
+ # Move it back, passing MTU, check MTU is not overridden
+ echo "${ns_a} ip link set vti6_a mtu 1280 type vti6 remote ${dummy6_0_addr} local ${dummy6_0_addr}" > /dev/kmsg
+ ${ns_a} ip link set vti6_a mtu 1280 type vti6 remote ${dummy6_0_addr} local ${dummy6_0_addr}
+ mtu="$(link_get_mtu "${ns_a}" vti6_a)"
+ if [ ${mtu} -ne 1280 ]; then
+ echo " vti6 MTU ${mtu} doesn't match configured value 1280"
+ pass=0
+ fi
+
+ return ${pass}
+}
+
trap cleanup EXIT
exitcode=0
--
2.15.1
^ permalink raw reply related
* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-15 16:21 UTC (permalink / raw)
To: Alexander Duyck
Cc: Timur Tabi, Netdev, sulrich, linux-arm-msm, linux-arm-kernel,
Jeff Kirsher, intel-wired-lan, LKML
In-Reply-To: <CAKgT0UcBQRoSPPZ73bdu1oEBGqBA8_c3ZAjti20=+9UwEqpXbw@mail.gmail.com>
On 3/15/2018 10:32 AM, Alexander Duyck wrote:
> We tend to do something like:
> update tx_buffer_info
> update tx_desc
> wmb()
> point first tx_buffer_info next_to_watch value at last tx_desc
> update next_to_use
> notify device via writel
>
> We do it this way because we have to synchronize between the Tx
> cleanup path and the hardware so we basically lump the two barriers
> together. instead of invoking both a smp_wmb and a wmb. Now that I
> look at the pseudocode though I wonder if we shouldn't move the
> next_to_use update before the wmb, but that might be material for
> another patch. Anyway, in the Tx cleanup path we should have an
> smp_rmb() after we read the next_to_watch values so that we avoid
> reading any of the other fields in the buffer_info if either the field
> is NULL or the descriptor pointed to has not been written back.
How do you feel about keeping wmb() very close to writel_relaxed() like this?
update tx_buffer_info
update tx_desc
point first tx_buffer_info next_to_watch value at last tx_desc
update next_to_use
wmb()
notify device via writel_relaxed()
I'm afraid that if the order of wmb() and writel() is not very
obvious or hidden in multiple functions, somebody can introduce a very nasty
bug in the future.
We also have to think about code maintenance.
--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* Re: [PATCH RFC bpf-next 1/6] bpf: Hooks for sys_bind
From: Mahesh Bandewar (महेश बंडेवार) @ 2018-03-15 16:22 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Eric Dumazet, Alexei Starovoitov, David S. Miller,
Daniel Borkmann, Network Development, Kernel Team
In-Reply-To: <20180315033657.jj7ozjx66p27h3ar@ast-mbp>
On Wed, Mar 14, 2018 at 8:37 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Wed, Mar 14, 2018 at 05:17:54PM -0700, Eric Dumazet wrote:
>>
>>
>> On 03/14/2018 11:41 AM, Alexei Starovoitov wrote:
>> > On Wed, Mar 14, 2018 at 11:00 AM, Alexei Starovoitov
>> > <alexei.starovoitov@gmail.com> wrote:
>> >>
>> >>> It seems this is exactly the case where a netns would be the correct answer.
>> >>
>> >> Unfortuantely that's not the case. That's what I tried to explain
>> >> in the cover letter:
>> >> "The setup involves per-container IPs, policy, etc, so traditional
>> >> network-only solutions that involve VRFs, netns, acls are not applicable."
>> >> To elaborate more on that:
>> >> netns is l2 isolation.
>> >> vrf is l3 isolation.
>> >> whereas to containerize an application we need to punch connectivity holes
>> >> in these layered techniques.
>> >> We also considered resurrecting Hannes's afnetns work
>> >> and even went as far as designing a new namespace for L4 isolation.
>> >> Unfortunately all hierarchical namespace abstraction don't work.
>> >> To run an application inside cgroup container that was not written
>> >> with containers in mind we have to make an illusion of running
>> >> in non-containerized environment.
>> >> In some cases we remember the port and container id in the post-bind hook
>> >> in a bpf map and when some other task in a different container is trying
>> >> to connect to a service we need to know where this service is running.
>> >> It can be remote and can be local. Both client and service may or may not
>> >> be written with containers in mind and this sockaddr rewrite is providing
>> >> connectivity and load balancing feature that you simply cannot do
>> >> with hierarchical networking primitives.
>> >
>> > have to explain this a bit further...
>> > We also considered hacking these 'connectivity holes' in
>> > netns and/or vrf, but that would be real layering violation,
>> > since clean l2, l3 abstraction would suddenly support
>> > something that breaks through the layers.
>> > Just like many consider ipvlan a bad hack that punches
>> > through the layers and connects l2 abstraction of netns
>> > at l3 layer, this is not something kernel should ever do.
>> > We really didn't want another ipvlan-like hack in the kernel.
>> > Instead bpf programs at bind/connect time _help_
>> > applications discover and connect to each other.
>> > All containers are running in init_nens and there are no vrfs.
>> > After bind/connect the normal fib/neighbor core networking
>> > logic works as it should always do. The whole system is
>> > clean from network point of view.
>>
>>
>> We apparently missed something when deploying ipvlan and one netns per
>> container/job
>
> Hanness expressed the reasons why RHEL doesn't support ipvlan long ago.
I had a long discussion with Hanness and there are two pending issues
(discounting minor bug fixes / improvement). (a) the
multicast-group-membership and (b) early demux.
multicast group membership is just a matter of putting some code there
to fix it. While early-demux is little harder without violating
isolation boundaries. To me isolation is critical / important and if
we find a right solution that doesn't violate isolation, we'd fix it.
> I couldn't find the complete link. This one mentions some of the issues:
> https://www.mail-archive.com/netdev@vger.kernel.org/msg157614.html
> Since ipvlan works for you, great, but it's clearly a layering violation.
> ipvlan connects L2 namespaces via L3 by doing its own fib lookups.
> To me it's a definition 'punch connectivity hole' in L2 abstraction.
> In normal L2 setup of netns+veth the traffic from one netns should
> have went into another netns via full L2. ipvlan cheats by giving
> L3 connectivity. It's not clean to me.
IPvlan supports three different modes and you have mixed all of them
while explaining your understanding of IPvlan. Probably one needs to
digest all these modes and evaluate them in the context of their use
case. Well, I'm not even going to attempt to explain the differences,
if you were serious you could have figured it out.
There are lots of use cases and people use it in interesting ways.
Each case can be better handled by using either VRF, or macvlan, or
IPvlan or whatever is out there. It would be childish to say one
use-case is better than others as these are *different* use cases. All
these solutions come with their own caveats and you choose what you
can live with. Well, you can always improve and I can see Redhat folks
are doing it and I appreciate their efforts.
Like I said there are several different ways to make this work with
namespaces in much cleaner way and IPvlan does not need to be
involved. However adding another eBPF hook just because we can in a
hackish way is *not* the right way. Especially when a problem has
already been solved (with namespace) these 2000 lines dont deserve to
be in kernel. eBPF is a good tool and there is a thin line between
using it appropriately and misusing it. I don't want to argue and we
can agree to disagree!
> There are still neighbour
> tables in netnses that are duplicated.
> Because netns is L2 there is full requeuing for traffic across netnses.
> I guess google doesn't prioritize container to container traffic
> while outside into netns via ipvlan works ok similar to bond, but
> imo it's cheating too.
> imo afnetns would have been much better alternative for your
> use case without ipvlan pitfalls, but as you said ipvlan already
> in the tree and afnetns is not.
> With afnetns early demux would have worked not only for traffic from
> the network, but for traffic across afnetns-es.
>
Isolation is a critical piece of our puzzle and none of the
suggestions you have given solve it. cgroups clearly don't! However,
those could be good solutions in some other use-cases.
>> I find netns isolation very clean, powerful, and it is there already.
>
> netns+veth is a clean abstraction, but netns+ipvlan is imo not.
> imo VRF is another clean L3 abstraction. Yet some folks tried
> to do VRF-like things with netns.
> David Ahern wrote nice blog about issues with that.
> I suspect VRF also could have worked for google use case
> and would have been easier to use than netns+ipvlan.
> But since ipvlan works for you in the current shape, great,
> I'm not going to argue further.
> Let's agree to disagree on cleanliness of the solution.
>
>> It also works with UDP just fine. Are you considering adding a hook
>> later for sendmsg() (unconnected socket or not), or do you want to use
>> the existing one in ip_finish_output(), adding per-packet overhead ?
>
> Currently that's indeed the case. Existing cgroup-bpf hooks
> at ip_finish_output work for many use cases, but per-packet overhead
> is bad. With bind/connect hooks we avoid that overhead for
> good traffic (which is tcp and connected udp). We still need
> to solve it for unconnected udp. Rough idea is to do similar
> sockaddr rewrite/drop in unconnected part of udp_sendmsg.
>
^ permalink raw reply
* Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
From: Mohammed Gamal @ 2018-03-15 16:24 UTC (permalink / raw)
To: Stephen Hemminger
Cc: otubo, sthemmin, netdev, linux-kernel, devel, vkuznets, davem
In-Reply-To: <1521019321.8260.1.camel@redhat.com>
On Wed, 2018-03-14 at 10:22 +0100, Mohammed Gamal wrote:
> On Tue, 2018-03-13 at 12:35 -0700, Stephen Hemminger wrote:
> > On Tue, 13 Mar 2018 20:06:50 +0100
> > Mohammed Gamal <mgamal@redhat.com> wrote:
> >
> > > Dring high network traffic changes to network interface
> > > parameters
> > > such as number of channels or MTU can cause a kernel panic with a
> > > NULL
> > > pointer dereference. This is due to netvsc_device_remove() being
> > > called and deallocating the channel ring buffers, which can then
> > > be
> > > accessed by netvsc_send_pkt() before they're allocated on calling
> > > netvsc_device_add()
> > >
> > > The patch fixes this problem by checking the channel state and
> > > returning
> > > ENODEV if not yet opened. We also move the call to
> > > hv_ringbuf_avail_percent()
> > > which may access the uninitialized ring buffer.
> > >
> > > Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
> > > ---
> > > drivers/net/hyperv/netvsc.c | 5 +++--
> > > 1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/hyperv/netvsc.c
> > > b/drivers/net/hyperv/netvsc.c
> > > index 0265d70..44a8358 100644
> > > --- a/drivers/net/hyperv/netvsc.c
> > > +++ b/drivers/net/hyperv/netvsc.c
> > > @@ -757,7 +757,7 @@ static inline int netvsc_send_pkt(
> > > struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > > packet->q_idx);
> > > u64 req_id;
> > > int ret;
> > > - u32 ring_avail = hv_ringbuf_avail_percent(&out_channel-
> > > > outbound);
> > >
> > > + u32 ring_avail;
> > >
> > > nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
> > > if (skb)
> > > @@ -773,7 +773,7 @@ static inline int netvsc_send_pkt(
> > >
> > > req_id = (ulong)skb;
> > >
> > > - if (out_channel->rescind)
> > > + if (out_channel->rescind || out_channel->state !=
> > > CHANNEL_OPENED_STATE)
> > > return -ENODEV;
> > >
> > > if (packet->page_buf_cnt) {
> > > @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt(
> > > VMBUS_DATA_PACKET_FLAG_CO
> > > MP
> > > LETION_REQUESTED);
> > > }
> > >
> > > + ring_avail = hv_ringbuf_avail_percent(&out_channel-
> > > > outbound);
> > >
> > > if (ret == 0) {
> > > atomic_inc_return(&nvchan->queue_sends);
> > >
> >
> > Thanks for your patch. Yes there are races with the current update
> > logic. The root cause goes higher up in the flow; the send queues
> > should
> > be stopped before netvsc_device_remove is called. Solving it where
> > you tried
> > to is racy and not going to work reliably.
> >
> > Network patches should go to netdev@vger.kernel.org
> >
> > You can't move the ring_avail check until after the
> > vmbus_sendpacket
> > because
> > that will break the flow control logic.
> >
>
> Why? I don't see ring_avail being used before that point.
Ah, stupid me. vmbus_sendpacket() will write to the ring buffer and
that means that ring_avail value will be different than the expected.
>
> > Instead, you should just move the avail_read check until just after
> > the existing rescind
> > check.
> >
> > Also, you shouldn't need to check for OPENED_STATE, just rescind is
> > enough.
>
> That rarely mitigated the race. channel->rescind flag is set on vmbus
> exit - called on module unload - and when a rescind offer is received
> from the host, which AFAICT doesn't happen on every call to
> netvsc_device_remove, so it's quite possible that the ringbuffer is
> accessed before it's allocated again on channel open and hence the
> check for OPENED_STAT - which is only set after all vmbus data is
> initialized.
>
Perhaps I haven't been clear enough. The NULL pointer dereference
happens in the call to hv_ringbuf_avail_percent() which is used to
calculate ring_avail.
So we need to stop the queues before calling it if the channel's ring
buffers haven't been allocated yet, but OTOH we should only stop the
queues based upon the value of ring_avail, so this leads into a chicken
and egg situation.
Is my observation here correct? Please correct me if I am wrong,
Stephen.
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
^ permalink raw reply
* Re: [net-next 1/5] tipc: obsolete TIPC_ZONE_SCOPE
From: Jon Maloy @ 2018-03-15 16:27 UTC (permalink / raw)
To: Jiri Pirko
Cc: netdev@vger.kernel.org, tipc-discussion@lists.sourceforge.net,
Mohan Krishna Ghanta Krishnamurthy, davem@davemloft.net
In-Reply-To: <20180315161122.GH2130@nanopsycho>
No, it won't. I just moved those functions and #defines to the bottom of the same file, and marked them as 'deprecated'.
BR
///jon
> -----Original Message-----
> From: Jiri Pirko [mailto:jiri@resnulli.us]
> Sent: Thursday, March 15, 2018 12:11
> To: Jon Maloy <jon.maloy@ericsson.com>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; Mohan Krishna Ghanta
> Krishnamurthy <mohan.krishna.ghanta.krishnamurthy@ericsson.com>; Tung
> Quang Nguyen <tung.q.nguyen@dektech.com.au>; Hoang Huu Le
> <hoang.h.le@dektech.com.au>; Canh Duc Luu
> <canh.d.luu@dektech.com.au>; Ying Xue <ying.xue@windriver.com>; tipc-
> discussion@lists.sourceforge.net
> Subject: Re: [net-next 1/5] tipc: obsolete TIPC_ZONE_SCOPE
>
> Thu, Mar 15, 2018 at 04:48:51PM CET, jon.maloy@ericsson.com wrote:
> >Publications for TIPC_CLUSTER_SCOPE and TIPC_ZONE_SCOPE are in all
> >aspects handled the same way, both on the publishing node and on the
> >receiving nodes.
> >
> >Despite previous ambitions to the contrary, this is never going to
> >change, so we take the conseqeunce of this and obsolete
> TIPC_ZONE_SCOPE
> >and related macros/functions. Whenever a user is doing a bind() or a
> >sendmsg() attempt using ZONE_SCOPE we translate this internally to
> >CLUSTER_SCOPE, while we remain compatible with users and remote nodes
> still using ZONE_SCOPE.
> >
> >Furthermore, the non-formalized scope value 0 has always been permitted
> >for use during lookup, with the same meaning as
> ZONE_SCOPE/CLUSTER_SCOPE.
> >We now permit it even as binding scope, but for compatibility reasons
> >we choose to not change the value of TIPC_CLUSTER_SCOPE.
> >
> >Acked-by: Ying Xue <ying.xue@windriver.com>
> >Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
>
> [...]
>
>
> >diff --git a/include/uapi/linux/tipc.h b/include/uapi/linux/tipc.h
> >index 14bacc7..4ac9f1f 100644
> >--- a/include/uapi/linux/tipc.h
> >+++ b/include/uapi/linux/tipc.h
> >@@ -61,50 +61,6 @@ struct tipc_name_seq {
> > __u32 upper;
> > };
> >
> >-/* TIPC Address Size, Offset, Mask specification for Z.C.N
> >- */
> >-#define TIPC_NODE_BITS 12
> >-#define TIPC_CLUSTER_BITS 12
> >-#define TIPC_ZONE_BITS 8
> >-
> >-#define TIPC_NODE_OFFSET 0
> >-#define TIPC_CLUSTER_OFFSET TIPC_NODE_BITS
> >-#define TIPC_ZONE_OFFSET (TIPC_CLUSTER_OFFSET +
> TIPC_CLUSTER_BITS)
> >-
> >-#define TIPC_NODE_SIZE ((1UL << TIPC_NODE_BITS) - 1)
> >-#define TIPC_CLUSTER_SIZE ((1UL << TIPC_CLUSTER_BITS) - 1)
> >-#define TIPC_ZONE_SIZE ((1UL << TIPC_ZONE_BITS) - 1)
> >-
> >-#define TIPC_NODE_MASK (TIPC_NODE_SIZE <<
> TIPC_NODE_OFFSET)
> >-#define TIPC_CLUSTER_MASK (TIPC_CLUSTER_SIZE <<
> TIPC_CLUSTER_OFFSET)
> >-#define TIPC_ZONE_MASK (TIPC_ZONE_SIZE <<
> TIPC_ZONE_OFFSET)
> >-
> >-#define TIPC_ZONE_CLUSTER_MASK (TIPC_ZONE_MASK |
> TIPC_CLUSTER_MASK)
> >-
> >-static inline __u32 tipc_addr(unsigned int zone,
> >- unsigned int cluster,
> >- unsigned int node)
> >-{
> >- return (zone << TIPC_ZONE_OFFSET) |
> >- (cluster << TIPC_CLUSTER_OFFSET) |
> >- node;
> >-}
> >-
> >-static inline unsigned int tipc_zone(__u32 addr) -{
> >- return addr >> TIPC_ZONE_OFFSET;
> >-}
> >-
> >-static inline unsigned int tipc_cluster(__u32 addr) -{
> >- return (addr & TIPC_CLUSTER_MASK) >> TIPC_CLUSTER_OFFSET;
> >-}
> >-
> >-static inline unsigned int tipc_node(__u32 addr) -{
> >- return addr & TIPC_NODE_MASK;
> >-}
>
> If someone includes tipc.h and uses any of this, your patch is going to break
> his compilation. Would anyone have good reason to use any of this?
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
^ permalink raw reply
* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-15 16:27 UTC (permalink / raw)
To: Alexander Duyck
Cc: Timur Tabi, Netdev, sulrich, linux-arm-msm, linux-arm-kernel,
Jeff Kirsher, intel-wired-lan, LKML
In-Reply-To: <eee8269d-b711-828c-ab84-5933bf86d024@codeaurora.org>
On 3/15/2018 12:21 PM, Sinan Kaya wrote:
> On 3/15/2018 10:32 AM, Alexander Duyck wrote:
>> We tend to do something like:
>> update tx_buffer_info
>> update tx_desc
>> wmb()
>> point first tx_buffer_info next_to_watch value at last tx_desc
>> update next_to_use
>> notify device via writel
>>
>> We do it this way because we have to synchronize between the Tx
>> cleanup path and the hardware so we basically lump the two barriers
>> together. instead of invoking both a smp_wmb and a wmb. Now that I
>> look at the pseudocode though I wonder if we shouldn't move the
>> next_to_use update before the wmb, but that might be material for
>> another patch. Anyway, in the Tx cleanup path we should have an
>> smp_rmb() after we read the next_to_watch values so that we avoid
>> reading any of the other fields in the buffer_info if either the field
>> is NULL or the descriptor pointed to has not been written back.
>
> How do you feel about keeping wmb() very close to writel_relaxed() like this?
>
> update tx_buffer_info
> update tx_desc
> point first tx_buffer_info next_to_watch value at last tx_desc
> update next_to_use
> wmb()
> notify device via writel_relaxed()
>
> I'm afraid that if the order of wmb() and writel() is not very
> obvious or hidden in multiple functions, somebody can introduce a very nasty
> bug in the future.
>
> We also have to think about code maintenance.
>
Now that I read your email again, I think this is the reason if I understood you
correctly.
"instead of invoking both a smp_wmb and a wmb"
You'd need something like
update tx_buffer_info
update tx_desc
smp_wmb()
point first tx_buffer_info next_to_watch value at last tx_desc
update next_to_use
wmb()
notify device via writel_relaxed()
Let me work on your comments.
--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
From: Liran Alon @ 2018-03-15 16:35 UTC (permalink / raw)
To: shmulik.ladkani
Cc: netdev, daniel, mrv, davem, linux-kernel, yuval.shaia, idan.brown
----- shmulik.ladkani@gmail.com wrote:
> On Thu, 15 Mar 2018 08:01:03 -0700 (PDT) Liran Alon
> <liran.alon@oracle.com> wrote:
> >
> > I still think that default behavior should be to zero skb->mark only
> when skb
> > cross netdevs in different netns.
>
> But the previous default was scrub the mark in *both* xnet and
> non-xnet
> situations.
>
> Therefore, there might be users which RELY on this (strange) default
> behavior in their same-netns-veth-pair setups.
> Meaning, changing the default behavior might break their apps relying
> on
> the former default behavior.
>
> This is why the "disable mark scrubbing in non-xnet case" should be
> opt-in.
We think the same.
The only difference is that I think this for now should be controllable
by a global /proc/sys/net/core file instead of giving a flexible per-netdev control.
Because that is a larger change that could be done later.
>
> Regards,
> Shmulik
^ permalink raw reply
* Re: [net-next 1/5] tipc: obsolete TIPC_ZONE_SCOPE
From: Jiri Pirko @ 2018-03-15 16:37 UTC (permalink / raw)
To: Jon Maloy
Cc: davem@davemloft.net, netdev@vger.kernel.org,
Mohan Krishna Ghanta Krishnamurthy, Tung Quang Nguyen,
Hoang Huu Le, Canh Duc Luu, Ying Xue,
tipc-discussion@lists.sourceforge.net
In-Reply-To: <DM5PR15MB15620DD9157E5EAECB63941E9AD00@DM5PR15MB1562.namprd15.prod.outlook.com>
Thu, Mar 15, 2018 at 05:27:17PM CET, jon.maloy@ericsson.com wrote:
>No, it won't. I just moved those functions and #defines to the bottom of the same file, and marked them as 'deprecated'.
Ah. That I missed. Thanks!
^ permalink raw reply
* Re: [Intel-wired-lan] [RFC PATCH 2/2] ixgbe: setup XPS via netif_set_xps()
From: Alexander Duyck @ 2018-03-15 16:43 UTC (permalink / raw)
To: Paolo Abeni; +Cc: Netdev, Eric Dumazet, intel-wired-lan, David S. Miller
In-Reply-To: <384ee099d617f3d3786a618b11cc10616923ec45.1521124830.git.pabeni@redhat.com>
On Thu, Mar 15, 2018 at 8:08 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> Before this commit, ixgbe with the default setting lacks XPS mapping
> for CPUs id greater than the number of tx queues.
>
> As a consequence the xmit path for such CPUs experience a relevant cost
> in __netdev_pick_tx, mainly due to skb_tx_hash(), as reported by the perf
> tool:
>
> 7.55%--netdev_pick_tx
> |
> --6.92%--__netdev_pick_tx
> |
> --6.35%--__skb_tx_hash
> |
> --5.94%--__skb_get_hash
> |
> --3.22%--__skb_flow_dissect
>
> in the following scenario:
>
> ethtool -L em1 combined 1
> taskset 2 netperf -H 192.168.1.1 -t UDP_STREAM -- -m 1
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.101.1 () port 0 AF_INET
> Socket Message Elapsed Messages
> Size Size Time Okay Errors Throughput
> bytes bytes secs # # 10^6bits/sec
>
> 212992 1 10.00 11497225 0 9.20
>
> After this commit the perf tool reports:
>
> 0.85%--__netdev_pick_tx
>
> and netperf reports:
>
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.101.1 () port 0 AF_INET
> Socket Message Elapsed Messages
> Size Size Time Okay Errors Throughput
> bytes bytes secs # # 10^6bits/sec
>
> 212992 1 10.00 12736058 0 10.19
>
> roughly +10% in xmit tput.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
I think we shouldn't be configuring XPS if number of Tx or Rx queues
is less than the number of CPUs, or ATR is not enabled.
Really the XPS bits are only really supposed to be used with the ATR
functionality enabled. If we don't have enough queues for a 1:1
mapping we should probably not be programming XPS since ATR isn't
going to function right anyway.
- Alex
^ permalink raw reply
* [PATCH] [v2] Bluetooth: btrsi: rework dependencies
From: Arnd Bergmann @ 2018-03-15 16:50 UTC (permalink / raw)
To: Marcel Holtmann, Johan Hedberg, Kalle Valo
Cc: Arnd Bergmann, Sebastian Reichel, Amitkumar Karwar,
Siva Rebbagondla, linux-bluetooth, linux-kernel, linux-wireless,
netdev
The linkage between the bluetooth driver and the wireless
driver is not defined properly, leading to build problems
such as:
warning: (BT_HCIRSI) selects RSI_COEX which has unmet direct dependencies (NETDEVICES && WLAN && WLAN_VENDOR_RSI && BT_HCIRSI && RSI_91X)
drivers/net/wireless/rsi/rsi_91x_main.o: In function `rsi_read_pkt':
(.text+0x205): undefined reference to `rsi_bt_ops'
As the dependency is actually the reverse (RSI_91X uses
the BT_RSI driver, not the other way round), this changes
the dependency to match, and enables the bluetooth driver
from the RSI_COEX symbol.
Fixes: 38aa4da50483 ("Bluetooth: btrsi: add new rsi bluetooth driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
v2: Pick a different from v1
---
drivers/bluetooth/Kconfig | 4 +---
drivers/net/wireless/rsi/Kconfig | 4 +++-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/bluetooth/Kconfig b/drivers/bluetooth/Kconfig
index d8bbd661dbdb..149a38ee1fce 100644
--- a/drivers/bluetooth/Kconfig
+++ b/drivers/bluetooth/Kconfig
@@ -393,9 +393,7 @@ config BT_QCOMSMD
kernel or say M to compile as a module.
config BT_HCIRSI
- tristate "Redpine HCI support"
- default n
- select RSI_COEX
+ tristate
help
Redpine BT driver.
This driver handles BT traffic from upper layers and pass
diff --git a/drivers/net/wireless/rsi/Kconfig b/drivers/net/wireless/rsi/Kconfig
index f004be33fcfa..c6006fab8638 100644
--- a/drivers/net/wireless/rsi/Kconfig
+++ b/drivers/net/wireless/rsi/Kconfig
@@ -13,6 +13,7 @@ if WLAN_VENDOR_RSI
config RSI_91X
tristate "Redpine Signals Inc 91x WLAN driver support"
+ select BT_RSI if RSI_COEX
depends on MAC80211
---help---
This option enabes support for RSI 1x1 devices.
@@ -44,7 +45,8 @@ config RSI_USB
config RSI_COEX
bool "Redpine Signals WLAN BT Coexistence support"
- depends on BT_HCIRSI && RSI_91X
+ depends on BT && RSI_91X
+ depends on !(BT=m && RSI_91X=y)
default y
---help---
This option enables the WLAN BT coex support in rsi drivers.
--
2.9.0
^ permalink raw reply related
* Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
From: Shmulik Ladkani @ 2018-03-15 16:50 UTC (permalink / raw)
To: Liran Alon
Cc: netdev, daniel, mrv, davem, linux-kernel, yuval.shaia, idan.brown
In-Reply-To: <e653a52e-6a7d-49c9-aec6-6bc6437819c6@default>
On Thu, 15 Mar 2018 09:35:51 -0700 (PDT) Liran Alon <liran.alon@oracle.com> wrote:
> ----- shmulik.ladkani@gmail.com wrote:
>
> > On Thu, 15 Mar 2018 08:01:03 -0700 (PDT) Liran Alon
> > <liran.alon@oracle.com> wrote:
> > >
> > > I still think that default behavior should be to zero skb->mark only
> > when skb
> > > cross netdevs in different netns.
> >
> > But the previous default was scrub the mark in *both* xnet and
> > non-xnet
> > situations.
> >
> > Therefore, there might be users which RELY on this (strange) default
> > behavior in their same-netns-veth-pair setups.
> > Meaning, changing the default behavior might break their apps relying
> > on
> > the former default behavior.
> >
> > This is why the "disable mark scrubbing in non-xnet case" should be
> > opt-in.
>
> We think the same.
> The only difference is that I think this for now should be controllable
> by a global /proc/sys/net/core file instead of giving a flexible per-netdev
> control.
> Because that is a larger change that could be done later.
A flags attribute to veth newlink is a very scoped change.
User controls this per veth creation.
This is way more neat than /proc/sys/net and provides the desired granular
control.
Also, scoping this to veth has the advantage of not affecting the many other
dev_forward_skb callers.
Regards,
Shmulik
^ permalink raw reply
* [PATCH] net: hns: Fix ethtool private flags
From: Matthias Brugger @ 2018-03-15 16:54 UTC (permalink / raw)
To: yisen.zhuang, salil.mehta, davem
Cc: tianjinchuan1, lipeng321, lixiaoping3, mbrugger, yankejian,
linyunsheng, huangdaode, stephen, netdev, linux-kernel,
matthias.bgg
The driver implementation returns support for private flags, while
no private flags are present. When asked for the number of private
flags it returns the number of statistic flag names.
Fix this by returning EOPNOTSUPP for not implemented ethtool flags.
Signed-off-by: Matthias Brugger <mbrugger@suse.com>
---
drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c | 2 +-
drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 2 +-
drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 2 +-
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 4 +++-
4 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
index 86944bc3b273..74bd260ca02a 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
@@ -666,7 +666,7 @@ static void hns_gmac_get_strings(u32 stringset, u8 *data)
static int hns_gmac_get_sset_count(int stringset)
{
- if (stringset == ETH_SS_STATS || stringset == ETH_SS_PRIV_FLAGS)
+ if (stringset == ETH_SS_STATS)
return ARRAY_SIZE(g_gmac_stats_string);
return 0;
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index b62816c1574e..93e71e27401b 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -422,7 +422,7 @@ void hns_ppe_update_stats(struct hns_ppe_cb *ppe_cb)
int hns_ppe_get_sset_count(int stringset)
{
- if (stringset == ETH_SS_STATS || stringset == ETH_SS_PRIV_FLAGS)
+ if (stringset == ETH_SS_STATS)
return ETH_PPE_STATIC_NUM;
return 0;
}
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
index 6f3570cfb501..e2e28532e4dc 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
@@ -876,7 +876,7 @@ void hns_rcb_get_stats(struct hnae_queue *queue, u64 *data)
*/
int hns_rcb_get_ring_sset_count(int stringset)
{
- if (stringset == ETH_SS_STATS || stringset == ETH_SS_PRIV_FLAGS)
+ if (stringset == ETH_SS_STATS)
return HNS_RING_STATIC_REG_NUM;
return 0;
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
index 7ea7f8a4aa2a..2e14a3ae1d8b 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
@@ -993,8 +993,10 @@ int hns_get_sset_count(struct net_device *netdev, int stringset)
cnt--;
return cnt;
- } else {
+ } else if (stringset == ETH_SS_STATS) {
return (HNS_NET_STATS_CNT + ops->get_sset_count(h, stringset));
+ } else {
+ return -EOPNOTSUPP;
}
}
--
2.16.2
^ permalink raw reply related
* [PATCH v2] net: ethernet: ti: cpsw: add check for in-band mode setting with RGMII PHY interface
From: SZ Lin (林上智) @ 2018-03-15 16:56 UTC (permalink / raw)
Cc: SZ Lin (林上智), Schuyler Patton,
Grygorii Strashko, David S. Miller, Ivan Khoronzhuk, Keerthy,
Sekhar Nori, linux-omap, netdev, linux-kernel
According to AM335x TRM[1] 14.3.6.2, AM437x TRM[2] 15.3.6.2 and
DRA7 TRM[3] 24.11.4.8.7.3.3, in-band mode in EXT_EN(bit18) register is only
available when PHY is configured in RGMII mode with 10Mbps speed. It will
cause some networking issues without RGMII mode, such as carrier sense
errors and low throughput. TI also mentioned this issue in their forum[4].
This patch adds the check mechanism for PHY interface with RGMII interface
type, the in-band mode can only be set in RGMII mode with 10Mbps speed.
References:
[1]: https://www.ti.com/lit/ug/spruh73p/spruh73p.pdf
[2]: http://www.ti.com/lit/ug/spruhl7h/spruhl7h.pdf
[3]: http://www.ti.com/lit/ug/spruic2b/spruic2b.pdf
[4]: https://e2e.ti.com/support/arm/sitara_arm/f/791/p/640765/2392155
Suggested-by: Holsety Chen (陳憲輝) <Holsety.Chen@moxa.com>
Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>
Signed-off-by: Schuyler Patton <spatton@ti.com>
---
Changes from v1:
- Use phy_interface_is_rgmii helper function
- Remove blank line
drivers/net/ethernet/ti/cpsw.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 1b1b78fdc138..b2b30c9df037 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1014,7 +1014,8 @@ static void _cpsw_adjust_link(struct cpsw_slave *slave,
/* set speed_in input in case RMII mode is used in 100Mbps */
if (phy->speed == 100)
mac_control |= BIT(15);
- else if (phy->speed == 10)
+ /* in band mode only works in 10Mbps RGMII mode */
+ else if ((phy->speed == 10) && phy_interface_is_rgmii(phy))
mac_control |= BIT(18); /* In Band mode */
if (priv->rx_pause)
--
2.16.2
^ permalink raw reply related
* rfc: remove print_vma_addr ? (was Re: [PATCH 00/16] remove eight obsolete architectures)
From: Joe Perches @ 2018-03-15 16:56 UTC (permalink / raw)
To: Geert Uytterhoeven, David Howells
Cc: Linux-Arch, Linux PWM List, Linux Fbdev development list,
Linux Watchdog Mailing List, Arnd Bergmann, linux-doc, netdev,
USB list, linux-wireless, Linux Kernel Mailing List,
DRI Development, linux-spi, linux-block, linux-ide, linux-input,
Linux FS Devel, Linux MM, linux-rtc
In-Reply-To: <CAMuHMdXcxuzCOnFCNm4NXDv-wfYJDO5GQpB_ECu7j=2BjMhNpA@mail.gmail.com>
On Thu, 2018-03-15 at 10:48 +0100, Geert Uytterhoeven wrote:
> Hi David,
>
> On Thu, Mar 15, 2018 at 10:42 AM, David Howells <dhowells@redhat.com> wrote:
> > Do we have anything left that still implements NOMMU?
>
> Sure: arm, c6x, m68k, microblaze, and sh.
I have a patchset that creates a vsprintf extension for
print_vma_addr and removes all the uses similar to the
print_symbol() removal.
This now avoids any possible printk interleaving.
Unfortunately, without some #ifdef in vsprintf, which
I would like to avoid, it increases the nommu kernel
size by ~500 bytes.
Anyone think this is acceptable?
Here's the overall patch, but I have it as a series
---
Documentation/core-api/printk-formats.rst | 9 +++++
arch/arm64/kernel/traps.c | 13 +++----
arch/mips/mm/fault.c | 16 ++++-----
arch/parisc/mm/fault.c | 15 ++++----
arch/riscv/kernel/traps.c | 11 +++---
arch/s390/mm/fault.c | 7 ++--
arch/sparc/mm/fault_32.c | 8 ++---
arch/sparc/mm/fault_64.c | 8 ++---
arch/tile/kernel/signal.c | 9 ++---
arch/um/kernel/trap.c | 13 +++----
arch/x86/kernel/signal.c | 10 ++----
arch/x86/kernel/traps.c | 18 ++++------
arch/x86/mm/fault.c | 12 +++----
include/linux/mm.h | 1 -
lib/vsprintf.c | 58 ++++++++++++++++++++++++++-----
mm/memory.c | 33 ------------------
16 files changed, 112 insertions(+), 129 deletions(-)
diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
index 934559b3c130..10a91da1bc83 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -157,6 +157,15 @@ DMA address types dma_addr_t
For printing a dma_addr_t type which can vary based on build options,
regardless of the width of the CPU data path.
+VMA name and address
+----------------------------
+
+::
+
+ %pav <name>[hexstart+hexsize] or ?[0+0] if unavailable
+
+For any address, print the vma's name and its starting address and size
+
Passed by reference.
Raw buffer as an escaped string
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 2b478565d774..48edf812ce8b 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -242,13 +242,14 @@ void arm64_force_sig_info(struct siginfo *info, const char *str,
if (!show_unhandled_signals_ratelimited())
goto send_sig;
- pr_info("%s[%d]: unhandled exception: ", tsk->comm, task_pid_nr(tsk));
if (esr)
- pr_cont("%s, ESR 0x%08x, ", esr_get_class_string(esr), esr);
-
- pr_cont("%s", str);
- print_vma_addr(KERN_CONT " in ", regs->pc);
- pr_cont("\n");
+ pr_info("%s[%d]: unhandled exception: %s, ESR 0x%08x, %s in %pav\n",
+ tsk->comm, task_pid_nr(tsk),
+ esr_get_class_string(esr), esr,
+ str, ®s->pc);
+ else
+ pr_info("%s[%d]: unhandled exception: %s in %pav\n",
+ tsk->comm, task_pid_nr(tsk), str, ®s->pc);
__show_regs(regs);
send_sig:
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index 4f8f5bf46977..ce7bf077a0f5 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -213,14 +213,14 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
tsk->comm,
write ? "write access to" : "read access from",
field, address);
- pr_info("epc = %0*lx in", field,
- (unsigned long) regs->cp0_epc);
- print_vma_addr(KERN_CONT " ", regs->cp0_epc);
- pr_cont("\n");
- pr_info("ra = %0*lx in", field,
- (unsigned long) regs->regs[31]);
- print_vma_addr(KERN_CONT " ", regs->regs[31]);
- pr_cont("\n");
+ pr_info("epc = %0*lx in %pav\n",
+ field,
+ (unsigned long)regs->cp0_epc,
+ ®s->cp0_epc);
+ pr_info("ra = %0*lx in %pav\n",
+ field,
+ (unsigned long)regs->regs[31],
+ ®s->regs[31]);
}
current->thread.trap_nr = (regs->cp0_cause >> 2) & 0x1f;
info.si_signo = SIGSEGV;
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index e247edbca68e..877cea702714 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -240,17 +240,14 @@ show_signal_msg(struct pt_regs *regs, unsigned long code,
if (!printk_ratelimit())
return;
- pr_warn("\n");
- pr_warn("do_page_fault() command='%s' type=%lu address=0x%08lx",
- tsk->comm, code, address);
- print_vma_addr(KERN_CONT " in ", regs->iaoq[0]);
-
- pr_cont("\ntrap #%lu: %s%c", code, trap_name(code),
- vma ? ',':'\n');
+ pr_warn("do_page_fault() command='%s' type=%lu address=0x%08lx in %pav\n",
+ tsk->comm, code, address, ®s->iaoq[0]);
if (vma)
- pr_cont(" vm_start = 0x%08lx, vm_end = 0x%08lx\n",
- vma->vm_start, vma->vm_end);
+ pr_warn("trap #%lu: %s%c, vm_start = 0x%08lx, vm_end = 0x%08lx\n",
+ code, trap_name(code), vma->vm_start, vma->vm_end);
+ else
+ pr_warn("trap #%lu: %s%c\n", code, trap_name(code));
show_regs(regs);
}
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 93132cb59184..16609dcb2546 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -78,12 +78,11 @@ static inline void do_trap_siginfo(int signo, int code,
void do_trap(struct pt_regs *regs, int signo, int code,
unsigned long addr, struct task_struct *tsk)
{
- if (show_unhandled_signals && unhandled_signal(tsk, signo)
- && printk_ratelimit()) {
- pr_info("%s[%d]: unhandled signal %d code 0x%x at 0x" REG_FMT,
- tsk->comm, task_pid_nr(tsk), signo, code, addr);
- print_vma_addr(KERN_CONT " in ", GET_IP(regs));
- pr_cont("\n");
+ if (show_unhandled_signals && unhandled_signal(tsk, signo) &&
+ printk_ratelimit()) {
+ pr_info("%s[%d]: unhandled signal %d code 0x%x at 0x" REG_FMT " in %pav\n",
+ tsk->comm, task_pid_nr(tsk), signo, code, addr,
+ &GET_IP(regs));
show_regs(regs);
}
diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index 93faeca52284..3b1d6d618af2 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -250,10 +250,9 @@ void report_user_fault(struct pt_regs *regs, long signr, int is_mm_fault)
return;
if (!printk_ratelimit())
return;
- printk(KERN_ALERT "User process fault: interruption code %04x ilc:%d ",
- regs->int_code & 0xffff, regs->int_code >> 17);
- print_vma_addr(KERN_CONT "in ", regs->psw.addr);
- printk(KERN_CONT "\n");
+ printk(KERN_ALERT "User process fault: interruption code %04x ilc:%d in %pav\n",
+ regs->int_code & 0xffff, regs->int_code >> 17,
+ ®s->psw.addr);
if (is_mm_fault)
dump_fault_info(regs);
show_regs(regs);
diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c
index a8103a84b4ac..206ec5a1c915 100644
--- a/arch/sparc/mm/fault_32.c
+++ b/arch/sparc/mm/fault_32.c
@@ -113,15 +113,11 @@ show_signal_msg(struct pt_regs *regs, int sig, int code,
if (!printk_ratelimit())
return;
- printk("%s%s[%d]: segfault at %lx ip %px (rpc %px) sp %px error %x",
+ printk("%s%s[%d]: segfault at %lx ip %px (rpc %px) sp %px error %x in %pav\n",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
tsk->comm, task_pid_nr(tsk), address,
(void *)regs->pc, (void *)regs->u_regs[UREG_I7],
- (void *)regs->u_regs[UREG_FP], code);
-
- print_vma_addr(KERN_CONT " in ", regs->pc);
-
- printk(KERN_CONT "\n");
+ (void *)regs->u_regs[UREG_FP], code, ®s->pc);
}
static void __do_fault_siginfo(int code, int sig, struct pt_regs *regs,
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 41363f46797b..a21199329ebe 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -154,15 +154,11 @@ show_signal_msg(struct pt_regs *regs, int sig, int code,
if (!printk_ratelimit())
return;
- printk("%s%s[%d]: segfault at %lx ip %px (rpc %px) sp %px error %x",
+ printk("%s%s[%d]: segfault at %lx ip %px (rpc %px) sp %px error %x in %pav\b",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
tsk->comm, task_pid_nr(tsk), address,
(void *)regs->tpc, (void *)regs->u_regs[UREG_I7],
- (void *)regs->u_regs[UREG_FP], code);
-
- print_vma_addr(KERN_CONT " in ", regs->tpc);
-
- printk(KERN_CONT "\n");
+ (void *)regs->u_regs[UREG_FP], code, ®s->tpc);
}
static void do_fault_siginfo(int code, int sig, struct pt_regs *regs,
diff --git a/arch/tile/kernel/signal.c b/arch/tile/kernel/signal.c
index f2bf557bb005..0556106dfe8a 100644
--- a/arch/tile/kernel/signal.c
+++ b/arch/tile/kernel/signal.c
@@ -383,13 +383,10 @@ void trace_unhandled_signal(const char *type, struct pt_regs *regs,
if (show_unhandled_signals <= 1 && !printk_ratelimit())
return;
- printk("%s%s[%d]: %s at %lx pc "REGFMT" signal %d",
+ printk("%s%s[%d]: %s at %lx pc " REGFMT " signal %d in %pav\n",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
- tsk->comm, task_pid_nr(tsk), type, address, regs->pc, sig);
-
- print_vma_addr(KERN_CONT " in ", regs->pc);
-
- printk(KERN_CONT "\n");
+ tsk->comm, task_pid_nr(tsk), type, address, regs->pc, sig,
+ ®s->pc);
if (show_unhandled_signals > 1) {
switch (sig) {
diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
index b2b02df9896e..9281248972c0 100644
--- a/arch/um/kernel/trap.c
+++ b/arch/um/kernel/trap.c
@@ -150,14 +150,11 @@ static void show_segv_info(struct uml_pt_regs *regs)
if (!printk_ratelimit())
return;
- printk("%s%s[%d]: segfault at %lx ip %px sp %px error %x",
- task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
- tsk->comm, task_pid_nr(tsk), FAULT_ADDRESS(*fi),
- (void *)UPT_IP(regs), (void *)UPT_SP(regs),
- fi->error_code);
-
- print_vma_addr(KERN_CONT " in ", UPT_IP(regs));
- printk(KERN_CONT "\n");
+ printk("%s%s[%d]: segfault at %lx ip %px sp %px error %x in %pav\n",
+ task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
+ tsk->comm, task_pid_nr(tsk), FAULT_ADDRESS(*fi),
+ (void *)UPT_IP(regs), (void *)UPT_SP(regs),
+ fi->error_code, &UPT_IP(regs));
}
static void bad_segv(struct faultinfo fi, unsigned long ip)
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 4cdc0b27ec82..9ab0c5c50b29 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -841,15 +841,11 @@ void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
{
struct task_struct *me = current;
- if (show_unhandled_signals && printk_ratelimit()) {
- printk("%s"
- "%s[%d] bad frame in %s frame:%p ip:%lx sp:%lx orax:%lx",
+ if (show_unhandled_signals && printk_ratelimit())
+ printk("%s%s[%d] bad frame in %s frame:%p ip:%lx sp:%lx orax:%lx in %pav\n",
task_pid_nr(current) > 1 ? KERN_INFO : KERN_EMERG,
me->comm, me->pid, where, frame,
- regs->ip, regs->sp, regs->orig_ax);
- print_vma_addr(KERN_CONT " in ", regs->ip);
- pr_cont("\n");
- }
+ regs->ip, regs->sp, regs->orig_ax, ®s->ip);
force_sig(SIGSEGV, me);
}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3d9b2308e7fa..c6e3d02759e5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -270,13 +270,10 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
tsk->thread.trap_nr = trapnr;
if (show_unhandled_signals && unhandled_signal(tsk, signr) &&
- printk_ratelimit()) {
- pr_info("%s[%d] trap %s ip:%lx sp:%lx error:%lx",
+ printk_ratelimit())
+ pr_info("%s[%d] trap %s ip:%lx sp:%lx error:%lx in %pav\n",
tsk->comm, tsk->pid, str,
- regs->ip, regs->sp, error_code);
- print_vma_addr(KERN_CONT " in ", regs->ip);
- pr_cont("\n");
- }
+ regs->ip, regs->sp, error_code, ®s->ip);
force_sig_info(signr, info ?: SEND_SIG_PRIV, tsk);
}
@@ -565,13 +562,10 @@ do_general_protection(struct pt_regs *regs, long error_code)
tsk->thread.trap_nr = X86_TRAP_GP;
if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
- printk_ratelimit()) {
- pr_info("%s[%d] general protection ip:%lx sp:%lx error:%lx",
+ printk_ratelimit())
+ pr_info("%s[%d] general protection ip:%lx sp:%lx error:%lx in %pav\n",
tsk->comm, task_pid_nr(tsk),
- regs->ip, regs->sp, error_code);
- print_vma_addr(KERN_CONT " in ", regs->ip);
- pr_cont("\n");
- }
+ regs->ip, regs->sp, error_code, ®s->ip);
force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
}
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e6af2b464c3d..b629319e621a 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -857,14 +857,10 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
if (!printk_ratelimit())
return;
- printk("%s%s[%d]: segfault at %lx ip %px sp %px error %lx",
- task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
- tsk->comm, task_pid_nr(tsk), address,
- (void *)regs->ip, (void *)regs->sp, error_code);
-
- print_vma_addr(KERN_CONT " in ", regs->ip);
-
- printk(KERN_CONT "\n");
+ printk("%s%s[%d]: segfault at %lx ip %px sp %px error %lx in %pav\n",
+ task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
+ tsk->comm, task_pid_nr(tsk), address,
+ (void *)regs->ip, (void *)regs->sp, error_code, ®s->ip);
}
static void
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9f1270360983..9584bd3e8c25 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2537,7 +2537,6 @@ extern int randomize_va_space;
#endif
const char * arch_vma_name(struct vm_area_struct *vma);
-void print_vma_addr(char *prefix, unsigned long rip);
void sparse_mem_maps_populate_node(struct page **map_map,
unsigned long pnum_begin,
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 942b5234a59b..9081476ea4ea 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -35,6 +35,8 @@
#include <net/addrconf.h>
#include <linux/siphash.h>
#include <linux/compiler.h>
+#include <linux/mm_types.h>
+
#ifdef CONFIG_BLOCK
#include <linux/blkdev.h>
#endif
@@ -407,6 +409,11 @@ struct printf_spec {
#define FIELD_WIDTH_MAX ((1 << 23) - 1)
#define PRECISION_MAX ((1 << 15) - 1)
+static const struct printf_spec strspec = {
+ .field_width = -1,
+ .precision = -1,
+};
+
static noinline_for_stack
char *number(char *buf, char *end, unsigned long long num,
struct printf_spec spec)
@@ -1427,6 +1434,45 @@ char *netdev_bits(char *buf, char *end, const void *addr, const char *fmt)
return special_hex_number(buf, end, num, size);
}
+static noinline_for_stack
+char *vma_addr(char *buf, char *end, const void *addr)
+{
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+ char *page;
+ char tbuf[2 * sizeof(unsigned long) * 2 + 4];
+ const char *output = "?[0+0]";
+
+ /*
+ * we might be running from an atomic context so we cannot sleep
+ */
+ if (!down_read_trylock(&mm->mmap_sem))
+ goto output;
+
+ vma = find_vma(mm, *(unsigned long *)addr);
+ if (!vma || !vma->vm_file)
+ goto up_read;
+
+ page = (char *)__get_free_page(GFP_ATOMIC | __GFP_NOWARN);
+ if (page) {
+ char *fp;
+
+ fp = file_path(vma->vm_file, page, PAGE_SIZE);
+ if (IS_ERR(fp))
+ fp = "?";
+ buf = string(buf, end, kbasename(fp), strspec);
+ sprintf(tbuf, "[%lx+%lx]",
+ vma->vm_start, vma->vm_end - vma->vm_start);
+ output = tbuf;
+ free_page((unsigned long)page);
+ }
+
+up_read:
+ up_read(&mm->mmap_sem);
+output:
+ return string(buf, end, output, strspec);
+}
+
static noinline_for_stack
char *address_val(char *buf, char *end, const void *addr, const char *fmt)
{
@@ -1434,6 +1480,8 @@ char *address_val(char *buf, char *end, const void *addr, const char *fmt)
int size;
switch (fmt[1]) {
+ case 'v':
+ return vma_addr(buf, end, addr);
case 'd':
num = *(const dma_addr_t *)addr;
size = sizeof(dma_addr_t);
@@ -1474,11 +1522,7 @@ char *format_flags(char *buf, char *end, unsigned long flags,
const struct trace_print_flags *names)
{
unsigned long mask;
- const struct printf_spec strspec = {
- .field_width = -1,
- .precision = -1,
- };
- const struct printf_spec numspec = {
+ static const struct printf_spec numspec = {
.flags = SPECIAL|SMALL,
.field_width = -1,
.precision = -1,
@@ -1548,10 +1592,6 @@ char *device_node_gen_full_name(const struct device_node *np, char *buf, char *e
{
int depth;
const struct device_node *parent = np->parent;
- static const struct printf_spec strspec = {
- .field_width = -1,
- .precision = -1,
- };
/* special case for root node */
if (!parent)
diff --git a/mm/memory.c b/mm/memory.c
index bc760df8a7f4..f1f922421bde 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4502,39 +4502,6 @@ int access_process_vm(struct task_struct *tsk, unsigned long addr,
}
EXPORT_SYMBOL_GPL(access_process_vm);
-/*
- * Print the name of a VMA.
- */
-void print_vma_addr(char *prefix, unsigned long ip)
-{
- struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma;
-
- /*
- * we might be running from an atomic context so we cannot sleep
- */
- if (!down_read_trylock(&mm->mmap_sem))
- return;
-
- vma = find_vma(mm, ip);
- if (vma && vma->vm_file) {
- struct file *f = vma->vm_file;
- char *buf = (char *)__get_free_page(GFP_NOWAIT);
- if (buf) {
- char *p;
-
- p = file_path(f, buf, PAGE_SIZE);
- if (IS_ERR(p))
- p = "?";
- printk("%s%s[%lx+%lx]", prefix, kbasename(p),
- vma->vm_start,
- vma->vm_end - vma->vm_start);
- free_page((unsigned long)buf);
- }
- }
- up_read(&mm->mmap_sem);
-}
-
#if defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_DEBUG_ATOMIC_SLEEP)
void __might_fault(const char *file, int line)
{
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related
* Re: [PATCH 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
From: Alexander Duyck @ 2018-03-15 16:58 UTC (permalink / raw)
To: Sinan Kaya
Cc: Timur Tabi, Netdev, sulrich, linux-arm-msm, linux-arm-kernel,
Jeff Kirsher, intel-wired-lan, LKML
In-Reply-To: <0175e460-3424-9838-1064-9f63dab3304f@codeaurora.org>
On Thu, Mar 15, 2018 at 9:27 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/15/2018 12:21 PM, Sinan Kaya wrote:
>> On 3/15/2018 10:32 AM, Alexander Duyck wrote:
>>> We tend to do something like:
>>> update tx_buffer_info
>>> update tx_desc
>>> wmb()
>>> point first tx_buffer_info next_to_watch value at last tx_desc
>>> update next_to_use
>>> notify device via writel
>>>
>>> We do it this way because we have to synchronize between the Tx
>>> cleanup path and the hardware so we basically lump the two barriers
>>> together. instead of invoking both a smp_wmb and a wmb. Now that I
>>> look at the pseudocode though I wonder if we shouldn't move the
>>> next_to_use update before the wmb, but that might be material for
>>> another patch. Anyway, in the Tx cleanup path we should have an
>>> smp_rmb() after we read the next_to_watch values so that we avoid
>>> reading any of the other fields in the buffer_info if either the field
>>> is NULL or the descriptor pointed to has not been written back.
>>
>> How do you feel about keeping wmb() very close to writel_relaxed() like this?
>>
>> update tx_buffer_info
>> update tx_desc
>> point first tx_buffer_info next_to_watch value at last tx_desc
>> update next_to_use
>> wmb()
>> notify device via writel_relaxed()
>>
>> I'm afraid that if the order of wmb() and writel() is not very
>> obvious or hidden in multiple functions, somebody can introduce a very nasty
>> bug in the future.
>>
>> We also have to think about code maintenance.
>>
>
> Now that I read your email again, I think this is the reason if I understood you
> correctly.
>
> "instead of invoking both a smp_wmb and a wmb"
>
> You'd need something like
>
> update tx_buffer_info
> update tx_desc
> smp_wmb()
> point first tx_buffer_info next_to_watch value at last tx_desc
> update next_to_use
> wmb()
> notify device via writel_relaxed()
>
> Let me work on your comments.
Yes, we would be doing something like that, but we are using just a
single wmb() to cover both cases since hardware will never look at the
tx_buffer_info and software will never read that descriptor ring as
long as the next_to_watch is NULL. By doing it this way we should have
both cases covered and not need to worry
The only other bit still remaining is the "maybe_stop_tx" logic which
lives between the wmb and writel_relaxed. That logic has a smp_mb
living in it that is triggered if we have to stop the queue. Once
again though that is only viewed by software so it existing between
the wmb and the writel_relaxed should not be an issue.
Starting to understand why I was a bit hesitant to have us start
taking on these changes now? :-)
- Alex
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH 03/15] ice: Start hardware initialization
From: Shannon Nelson @ 2018-03-15 17:00 UTC (permalink / raw)
To: Venkataramanan, Anirudh, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
In-Reply-To: <1521065151.696.91.camel@intel.com>
On 3/14/2018 3:05 PM, Venkataramanan, Anirudh wrote:
> On Mon, 2018-03-12 at 19:05 -0700, Shannon Nelson wrote:
>> On 3/9/2018 9:21 AM, Anirudh Venkataramanan wrote:
>>> +
>>> +/**
>>> + * ice_read_sr_aq - Read Shadow RAM.
>>> + * @hw: pointer to the HW structure
>>> + * @offset: offset in words from module start
>>> + * @words: number of words to read
>>> + * @data: buffer for words reads from Shadow RAM
>>> + * @last_command: tells the AdminQ that this is the last command
>>> + *
>>> + * Reads 16-bit word buffers from the Shadow RAM using the admin
>>> command.
>>> + */
>>> +static enum ice_status
>>> +ice_read_sr_aq(struct ice_hw *hw, u32 offset, u16 words, u16
>>> *data,
>>> + bool last_command)
>>> +{
>>> + enum ice_status status;
>>> +
>>> + status = ice_check_sr_access_params(hw, offset, words);
>>> + if (!status)
>>> + status = ice_aq_read_nvm(hw, 0, 2 * offset, 2 *
>>> words, data,
>>
>> Why the doubling of offset and words? If this is some general
>> adjustment made for the AQ interface, it should be made in
>> ice_aq_read_nvm(). If not, then some explanation is needed here.
>
> ice_read_sr_aq expects a word offset and size in words. The
> ice_aq_read_nvm interface expects offset and size in bytes. The
> doubling is a conversion from word offset/size to byte offset/size.
In that case, this might be a good place for a small comment for readers
like me who don't have the spec available.
sln
^ permalink raw reply
* Re: [RFC,POC] iptables/nftables to epbf/xdp via common intermediate layer
From: Florian Westphal @ 2018-03-15 17:00 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Edward Cree, Florian Westphal, Daniel Borkmann, netdev, ast,
pablo, David S. Miller
In-Reply-To: <20180315161321.kqcnkzf52gu55yku@ast-mbp.dhcp.thefacebook.com>
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> The way this IMR defined today looks pretty much like nft and
> it feels a bit too low level than iptable conversion would need.
It wasn't so much about a specific IMR but to avoid code duplication
between nft and iptables translators.
> I think it would be simpler to have user space only extensions
> and opcodes added to bpf for the purpose of the translation.
> Like there is no bpf instruction called 'load from IP header',
> but we can make one. Just extend extended bpf with an instruction
> like this and on the first pass do full conversion of nft
> directly into this 'extended extended bpf'.
I don't want to duplicate any ebpf conversion (and optimisations)
in the nft part.
If nft can be translated to this 'extended extended bpf' and
this then generates bpf code from nft input all is good.
^ permalink raw reply
* Re: [Intel-wired-lan] [RFC PATCH 2/2] ixgbe: setup XPS via netif_set_xps()
From: Paolo Abeni @ 2018-03-15 17:05 UTC (permalink / raw)
To: Alexander Duyck; +Cc: Netdev, Eric Dumazet, intel-wired-lan, David S. Miller
In-Reply-To: <CAKgT0UcoVECKfXBAThcLqb8vRtJetqGjop9pGxNL8Q5rXRtiyg@mail.gmail.com>
Hi,
On Thu, 2018-03-15 at 09:43 -0700, Alexander Duyck wrote:
> On Thu, Mar 15, 2018 at 8:08 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> > Before this commit, ixgbe with the default setting lacks XPS mapping
> > for CPUs id greater than the number of tx queues.
> >
> > As a consequence the xmit path for such CPUs experience a relevant cost
> > in __netdev_pick_tx, mainly due to skb_tx_hash(), as reported by the perf
> > tool:
> >
> > 7.55%--netdev_pick_tx
> > |
> > --6.92%--__netdev_pick_tx
> > |
> > --6.35%--__skb_tx_hash
> > |
> > --5.94%--__skb_get_hash
> > |
> > --3.22%--__skb_flow_dissect
> >
> > in the following scenario:
> >
> > ethtool -L em1 combined 1
> > taskset 2 netperf -H 192.168.1.1 -t UDP_STREAM -- -m 1
> > MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.101.1 () port 0 AF_INET
> > Socket Message Elapsed Messages
> > Size Size Time Okay Errors Throughput
> > bytes bytes secs # # 10^6bits/sec
> >
> > 212992 1 10.00 11497225 0 9.20
> >
> > After this commit the perf tool reports:
> >
> > 0.85%--__netdev_pick_tx
> >
> > and netperf reports:
> >
> > MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.101.1 () port 0 AF_INET
> > Socket Message Elapsed Messages
> > Size Size Time Okay Errors Throughput
> > bytes bytes secs # # 10^6bits/sec
> >
> > 212992 1 10.00 12736058 0 10.19
> >
> > roughly +10% in xmit tput.
> >
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>
> I think we shouldn't be configuring XPS if number of Tx or Rx queues
> is less than the number of CPUs, or ATR is not enabled.
Thank you for the feedback!
Please note the currently the ixgbe driver is enabling XPS regardless
of the above considerations.
> Really the XPS bits are only really supposed to be used with the ATR
> functionality enabled. If we don't have enough queues for a 1:1
> mapping we should probably not be programming XPS since ATR isn't
> going to function right anyway.
uhm... I don't know the details of ATR, but apparently it is for TCP
only, while the use-case I'm referring to is plain (no tunnel)
unconnected UDP traffic. Am I missing something?
thanks,
Paolo
^ permalink raw reply
* Re: rfc: remove print_vma_addr ? (was Re: [PATCH 00/16] remove eight obsolete architectures)
From: Matthew Wilcox @ 2018-03-15 17:08 UTC (permalink / raw)
To: Joe Perches
Cc: Geert Uytterhoeven, David Howells, Arnd Bergmann, Linux-Arch,
Linux Kernel Mailing List, linux-doc-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-ide-u79uwXL29TY76Z2rM5mHXA,
linux-input-u79uwXL29TY76Z2rM5mHXA, netdev, linux-wireless,
Linux PWM List, linux-rtc-u79uwXL29TY76Z2rM5mHXA, linux-spi,
USB list, DRI Development, Linux Fbdev development list,
Linux Watchdog Mailing List, Linux FS Devel
In-Reply-To: <1521133006.22221.35.camel-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
On Thu, Mar 15, 2018 at 09:56:46AM -0700, Joe Perches wrote:
> I have a patchset that creates a vsprintf extension for
> print_vma_addr and removes all the uses similar to the
> print_symbol() removal.
>
> This now avoids any possible printk interleaving.
>
> Unfortunately, without some #ifdef in vsprintf, which
> I would like to avoid, it increases the nommu kernel
> size by ~500 bytes.
>
> Anyone think this is acceptable?
>
> Here's the overall patch, but I have it as a series
> ---
> Documentation/core-api/printk-formats.rst | 9 +++++
> arch/arm64/kernel/traps.c | 13 +++----
> arch/mips/mm/fault.c | 16 ++++-----
> arch/parisc/mm/fault.c | 15 ++++----
> arch/riscv/kernel/traps.c | 11 +++---
> arch/s390/mm/fault.c | 7 ++--
> arch/sparc/mm/fault_32.c | 8 ++---
> arch/sparc/mm/fault_64.c | 8 ++---
> arch/tile/kernel/signal.c | 9 ++---
> arch/um/kernel/trap.c | 13 +++----
> arch/x86/kernel/signal.c | 10 ++----
> arch/x86/kernel/traps.c | 18 ++++------
> arch/x86/mm/fault.c | 12 +++----
> include/linux/mm.h | 1 -
> lib/vsprintf.c | 58 ++++++++++++++++++++++++++-----
> mm/memory.c | 33 ------------------
> 16 files changed, 112 insertions(+), 129 deletions(-)
This doesn't feel like a huge win since it's only called ~once per
architecture. I'd be more excited if it made the printing of the whole
thing standardised; eg we have a print_fault() function in mm/memory.c
which takes a suitable set of arguments.
^ permalink raw reply
* Re: rfc: remove print_vma_addr ? (was Re: [PATCH 00/16] remove eight obsolete architectures)
From: Joe Perches @ 2018-03-15 17:13 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Linux-Arch, linux-block, Linux Fbdev development list,
Linux Watchdog Mailing List, Arnd Bergmann, linux-doc, netdev,
USB list, linux-wireless, Linux Kernel Mailing List,
Linux PWM List, linux-spi, David Howells, linux-ide,
Geert Uytterhoeven, DRI Development, linux-input, Linux FS Devel,
Linux MM, linux-rtc
In-Reply-To: <20180315170830.GA17574@bombadil.infradead.org>
On Thu, 2018-03-15 at 10:08 -0700, Matthew Wilcox wrote:
> On Thu, Mar 15, 2018 at 09:56:46AM -0700, Joe Perches wrote:
> > I have a patchset that creates a vsprintf extension for
> > print_vma_addr and removes all the uses similar to the
> > print_symbol() removal.
> >
> > This now avoids any possible printk interleaving.
> >
> > Unfortunately, without some #ifdef in vsprintf, which
> > I would like to avoid, it increases the nommu kernel
> > size by ~500 bytes.
> >
> > Anyone think this is acceptable?
[]
> This doesn't feel like a huge win since it's only called ~once per
> architecture. I'd be more excited if it made the printing of the whole
> thing standardised; eg we have a print_fault() function in mm/memory.c
> which takes a suitable set of arguments.
Sure but perhaps that's not feasible as the surrounding output
is per-arch specific.
What could be a standardized fault message here?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply
* Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
From: Liran Alon @ 2018-03-15 17:14 UTC (permalink / raw)
To: shmulik.ladkani
Cc: netdev, mrv, daniel, davem, linux-kernel, yuval.shaia, idan.brown
----- shmulik.ladkani@gmail.com wrote:
> On Thu, 15 Mar 2018 09:35:51 -0700 (PDT) Liran Alon
> <liran.alon@oracle.com> wrote:
> > ----- shmulik.ladkani@gmail.com wrote:
> >
> > > On Thu, 15 Mar 2018 08:01:03 -0700 (PDT) Liran Alon
> > > <liran.alon@oracle.com> wrote:
> > > >
> > > > I still think that default behavior should be to zero skb->mark
> only
> > > when skb
> > > > cross netdevs in different netns.
> > >
> > > But the previous default was scrub the mark in *both* xnet and
> > > non-xnet
> > > situations.
> > >
> > > Therefore, there might be users which RELY on this (strange)
> default
> > > behavior in their same-netns-veth-pair setups.
> > > Meaning, changing the default behavior might break their apps
> relying
> > > on
> > > the former default behavior.
> > >
> > > This is why the "disable mark scrubbing in non-xnet case" should
> be
> > > opt-in.
> >
> > We think the same.
> > The only difference is that I think this for now should be
> controllable
> > by a global /proc/sys/net/core file instead of giving a flexible
> per-netdev
> > control.
> > Because that is a larger change that could be done later.
>
> A flags attribute to veth newlink is a very scoped change.
> User controls this per veth creation.
> This is way more neat than /proc/sys/net and provides the desired
> granular
> control.
>
> Also, scoping this to veth has the advantage of not affecting the many
> other
> dev_forward_skb callers.
Agreed. But isn't this an issue also for the
many others (& future) callers of dev_forward_skb()?
This seems problematic to me.
This will kinda leave a kernel interface with broken default behavior
for backwards comparability.
A flag to netdev or /proc/sys/net/core to "fix" default behavior
will avoid this.
>
> Regards,
> Shmulik
^ permalink raw reply
* Re: [RFC PATCH 0/2] net:setup XPS mapping for each online CPU
From: Paolo Abeni @ 2018-03-15 17:20 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, David Miller, Jeff Kirsher, intel-wired-lan,
Alexander Duyck
In-Reply-To: <CANn89iLJOVuN-1j+9_qr_LaU7AE6COHdnp4Rc7fe9p6_s=hCCA@mail.gmail.com>
Hi,
On Thu, 2018-03-15 at 15:59 +0000, Eric Dumazet wrote:
> On Thu, Mar 15, 2018 at 8:51 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > I'm sorry, I do not follow. AFAICS with unconnected sockets without XPS
> > we always hit the netdev_pick_tx()/skb_tx_hash()/skb_flow_dissect()
> > overhead in xmit path.
>
> Then fix this if you want, instead of fixing one NIC only, or by enforcing
> XPS by all NIC.
>
> For unconnected sockets, picking the TX queue based on current cpu is good,
> we do not have to enforce ordering as much as possible.
>
> (pfifo_fast no longer can enforce it anyway)
Thank you for the prompt reply.
I'm double checking to avoid misinterpretation on my side: are you
suggesting to plug a CPU-based selection logic for unconnected sockets
in netdev_pick_tx() or to cook patches like 2/2 for all the relevant
NICs?
Thanks!
Paolo
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox