* Curious bpf regression in 5.18 already fixed in stable 5.18.3
@ 2022-06-15 16:45 Maciej Żenczykowski
2022-06-15 16:57 ` Maciej Żenczykowski
0 siblings, 1 reply; 13+ messages in thread
From: Maciej Żenczykowski @ 2022-06-15 16:45 UTC (permalink / raw)
To: Linux NetDev, BPF Mailing List, Maciej Żenczykowski,
Stanislav Fomichev, Alexei Starovoitov, Martin KaFai Lau,
Sasha Levin, Carlos Llamas
Are you folks aware that:
'bpf: Move rcu lock management out of BPF_PROG_RUN routines'
fixes a weird regression where sendmsg with an egress tc bpf program
denying it was returning EFAULT instead of EPERM
I've confirmed vanilla 5.18.0 is broken, and all it takes is
cherrypicking that specific stable 5.18.x patch [
710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour.
This was not a flaky failure... but a 100% reproducible behavioural
breakage/failure in the test case at
https://cs.android.com/android/platform/superproject/+/master:kernel/tests/net/test/bpf_test.py;l=517
(where 5.18 would return EFAULT instead of EPERM)
---
A non standalone but perhaps useful (for reading) simplification of
the test case follows.
I was planning on reporting it, hence why I was trying to trim it down
and have this ready anyway, only to discover it's already been fixed.
But the commit message seems to be unrelated... some sort of compiler
optimization shenanigans? Missing test case opportunity?
Note: that I run this on x86_64 UML - that might matter??
#!/usr/bin/python
# extracted snippet from AOSP, Apache2 licensed
import csocket
import cstruct
import ctypes
import errno
import os
import platform
import socket
import unittest
__NR_bpf = { # pylint: disable=invalid-name
"aarch64-32bit": 386,
"aarch64-64bit": 280,
"armv7l-32bit": 386,
"armv8l-32bit": 386,
"armv8l-64bit": 280,
"i686-32bit": 357,
"i686-64bit": 321,
"x86_64-32bit": 357,
"x86_64-64bit": 321,
}[os.uname()[4] + "-" + platform.architecture()[0]]
LOG_LEVEL = 1
LOG_SIZE = 65536
BPF_PROG_LOAD = 5
BPF_PROG_ATTACH = 8
BPF_PROG_DETACH = 9
BPF_PROG_TYPE_CGROUP_SKB = 8
BPF_CGROUP_INET_EGRESS = 1
BPF_REG_0 = 0
BPF_JMP = 0x05
BPF_K = 0x00
BPF_ALU64 = 0x07
BPF_MOV = 0xb0
BPF_EXIT = 0x90
BpfAttrProgLoad = cstruct.Struct(
"bpf_attr_prog_load", "=IIQQIIQI", "prog_type insn_cnt insns"
" license log_level log_size log_buf kern_version")
BpfAttrProgAttach = cstruct.Struct(
"bpf_attr_prog_attach", "=III", "target_fd attach_bpf_fd attach_type")
BpfInsn = cstruct.Struct("bpf_insn", "=BBhi", "code dst_src_reg off imm")
libc = ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True)
def BpfSyscall(op, attr):
ret = libc.syscall(__NR_bpf, op, csocket.VoidPointer(attr), len(attr))
csocket.MaybeRaiseSocketError(ret)
return ret
def BpfProgLoad(prog_type, instructions, prog_license=b"GPL"):
bpf_prog = "".join(instructions)
insn_buff = ctypes.create_string_buffer(bpf_prog)
gpl_license = ctypes.create_string_buffer(prog_license)
log_buf = ctypes.create_string_buffer(b"", LOG_SIZE)
return BpfSyscall(BPF_PROG_LOAD,
BpfAttrProgLoad((prog_type, len(insn_buff) / len(BpfInsn),
ctypes.addressof(insn_buff),
ctypes.addressof(gpl_license), LOG_LEVEL,
LOG_SIZE, ctypes.addressof(log_buf), 0)))
# Attach a eBPF filter to a cgroup
def BpfProgAttach(prog_fd, target_fd, prog_type):
attr = BpfAttrProgAttach((target_fd, prog_fd, prog_type))
return BpfSyscall(BPF_PROG_ATTACH, attr)
# Detach a eBPF filter from a cgroup
def BpfProgDetach(target_fd, prog_type):
attr = BpfAttrProgAttach((target_fd, 0, prog_type))
return BpfSyscall(BPF_PROG_DETACH, attr)
class BpfCgroupEgressTest(unittest.TestCase):
def setUp(self):
self._cg_fd = os.open("/sys/fs/cgroup", os.O_DIRECTORY | os.O_RDONLY)
BpfProgAttach(BpfProgLoad(BPF_PROG_TYPE_CGROUP_SKB, [
BpfInsn((BPF_ALU64 | BPF_MOV | BPF_K, BPF_REG_0, 0,
0)).Pack(), # Mov64Imm(REG0, 0)
BpfInsn((BPF_JMP | BPF_EXIT, 0, 0, 0)).Pack() # Exit
]), self._cg_fd, BPF_CGROUP_INET_EGRESS)
def tearDown(self):
BpfProgDetach(self._cg_fd, BPF_CGROUP_INET_EGRESS)
os.close(self._cg_fd)
def testCgroupEgressBlocked(self):
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, 0)
s.bind(("127.0.0.1", 0))
addr = s.getsockname()
# previously: s.sendto("foo", addr) would fail with EPERM, but
on 5.18+ it EFAULTs
self.assertRaisesRegexp(EnvironmentError,
os.strerror(errno.EFAULT), s.sendto, "foo", addr)
if __name__ == "__main__":
unittest.main()
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3 2022-06-15 16:45 Curious bpf regression in 5.18 already fixed in stable 5.18.3 Maciej Żenczykowski @ 2022-06-15 16:57 ` Maciej Żenczykowski 2022-06-15 17:37 ` Alexei Starovoitov 0 siblings, 1 reply; 13+ messages in thread From: Maciej Żenczykowski @ 2022-06-15 16:57 UTC (permalink / raw) To: Linux NetDev, BPF Mailing List, Maciej Żenczykowski, Stanislav Fomichev, Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas On Wed, Jun 15, 2022 at 9:45 AM Maciej Żenczykowski <zenczykowski@gmail.com> wrote: > > Are you folks aware that: > > 'bpf: Move rcu lock management out of BPF_PROG_RUN routines' > > fixes a weird regression where sendmsg with an egress tc bpf program > denying it was returning EFAULT instead of EPERM > > I've confirmed vanilla 5.18.0 is broken, and all it takes is > cherrypicking that specific stable 5.18.x patch [ > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour. > > This was not a flaky failure... but a 100% reproducible behavioural > breakage/failure in the test case at > https://cs.android.com/android/platform/superproject/+/master:kernel/tests/net/test/bpf_test.py;l=517 > (where 5.18 would return EFAULT instead of EPERM) I bisected on 5.18.x to find the fixing CL, so I don't know which CL actually caused the breakage. sdf says: 5.15 is where they rewrote defines to funcs, so there is still something else involved it seems b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added EFAULT in 5.16 (we've added a mechanism to return custom errno, I wonder if some of that is related) and that this EFAULT breakage is not something he was expecting to fix... so it's some sort of unintended consequence. I recall that: - vanilla 5.15 and 5.16 are definitely good - I think the only regression in 5.17 is an unrelated icmp socket one - so from a bpf perspective it was also good. - 5.18 had 3 regressions: icmp sockets, the pf_key regression (fixed via revert in 5.18.4) plus this bpf one The bad pf_key change being reverted in 5.18.4 is why I even switched from dev/test against 5.18 to against 5.18.4 and noticed that this was already fixed before I could even report it... ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3 2022-06-15 16:57 ` Maciej Żenczykowski @ 2022-06-15 17:37 ` Alexei Starovoitov 2022-06-15 17:46 ` Maciej Żenczykowski 0 siblings, 1 reply; 13+ messages in thread From: Alexei Starovoitov @ 2022-06-15 17:37 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Linux NetDev, BPF Mailing List, Stanislav Fomichev, Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas On Wed, Jun 15, 2022 at 9:57 AM Maciej Żenczykowski <maze@google.com> wrote: > > > > I've confirmed vanilla 5.18.0 is broken, and all it takes is > > cherrypicking that specific stable 5.18.x patch [ > > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour. ... > b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added EFAULT in 5.16 There are no such sha-s in the upstream kernel. Sorry we cannot help with debugging of android kernels. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3 2022-06-15 17:37 ` Alexei Starovoitov @ 2022-06-15 17:46 ` Maciej Żenczykowski 2022-06-15 17:55 ` sdf 0 siblings, 1 reply; 13+ messages in thread From: Maciej Żenczykowski @ 2022-06-15 17:46 UTC (permalink / raw) To: Alexei Starovoitov Cc: Linux NetDev, BPF Mailing List, Stanislav Fomichev, Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas On Wed, Jun 15, 2022 at 10:38 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, Jun 15, 2022 at 9:57 AM Maciej Żenczykowski <maze@google.com> wrote: > > > > > > I've confirmed vanilla 5.18.0 is broken, and all it takes is > > > cherrypicking that specific stable 5.18.x patch [ > > > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour. > ... > > b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added EFAULT in 5.16 > > There are no such sha-s in the upstream kernel. > Sorry we cannot help with debugging of android kernels. Yes, sdf@ quoted the wrong sha1, it's a clean cherrypick to an internal branch of 'bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return value' commit b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.16.y&id=b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93 Anyway, I think it's unrelated - or at least not the immediate root cause. Also there's *no* Android kernels involved here. This is the android net tests failing on vanilla 5.18 and passing on 5.18.3. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3 2022-06-15 17:46 ` Maciej Żenczykowski @ 2022-06-15 17:55 ` sdf 2022-06-15 20:26 ` sdf 0 siblings, 1 reply; 13+ messages in thread From: sdf @ 2022-06-15 17:55 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List, Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas On 06/15, Maciej Żenczykowski wrote: > On Wed, Jun 15, 2022 at 10:38 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Wed, Jun 15, 2022 at 9:57 AM Maciej Żenczykowski <maze@google.com> > wrote: > > > > > > > > I've confirmed vanilla 5.18.0 is broken, and all it takes is > > > > cherrypicking that specific stable 5.18.x patch [ > > > > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour. > > ... > > > b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added > EFAULT in 5.16 > > > > There are no such sha-s in the upstream kernel. > > Sorry we cannot help with debugging of android kernels. > Yes, sdf@ quoted the wrong sha1, it's a clean cherrypick to an > internal branch of > 'bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return > value' > commit b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93. > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.16.y&id=b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93 > Anyway, I think it's unrelated - or at least not the immediate root cause. > Also there's *no* Android kernels involved here. > This is the android net tests failing on vanilla 5.18 and passing on > 5.18.3. Yeah, sorry, didn't mean to send those outside :-) Attached un-android-ified testcase. Passes on bpf-next, trying to see what happens on vanilla 5.18. Will update once I get more data.. -- #!/usr/bin/python2.7 # extracted snippet from AOSP, Apache2 licensed import ctypes import ctypes.util import re import errno import os import platform import struct import socket import unittest __NR_bpf = { # pylint: disable=invalid-name "aarch64-32bit": 386, "aarch64-64bit": 280, "armv7l-32bit": 386, "armv8l-32bit": 386, "armv8l-64bit": 280, "i686-32bit": 357, "i686-64bit": 321, "x86_64-32bit": 357, "x86_64-64bit": 321, }[os.uname()[4] + "-" + platform.architecture()[0]] LOG_LEVEL = 1 LOG_SIZE = 65536 BPF_PROG_LOAD = 5 BPF_PROG_ATTACH = 8 BPF_PROG_DETACH = 9 BPF_PROG_TYPE_CGROUP_SKB = 8 BPF_CGROUP_INET_EGRESS = 1 BPF_REG_0 = 0 BPF_JMP = 0x05 BPF_K = 0x00 BPF_ALU64 = 0x07 BPF_MOV = 0xb0 BPF_EXIT = 0x90 def CalcSize(fmt): if "A" in fmt: fmt = fmt.replace("A", "s") # Remove the last digital since it will cause error in python3. fmt = (re.split('\d+$', fmt)[0]) return struct.calcsize(fmt) def CalcNumElements(fmt): prevlen = len(fmt) fmt = fmt.replace("S", "") numstructs = prevlen - len(fmt) size = CalcSize(fmt) elements = struct.unpack(fmt, b"\x00" * size) return len(elements) + numstructs def Struct(name, fmt, fieldnames, substructs={}): """Function that returns struct classes.""" class Meta(type): def __len__(cls): return cls._length def __init__(cls, unused_name, unused_bases, namespace): # Make the class object have the name that's passed in. type.__init__(cls, namespace["_name"], unused_bases, namespace) class CStruct(object): """Class representing a C-like structure.""" __metaclass__ = Meta # Name of the struct. _name = name # List of field names. _fieldnames = fieldnames # Dict mapping field indices to nested struct classes. _nested = {} # List of string fields that are ASCII strings. _asciiz = set() _fieldnames = _fieldnames.split(" ") # Parse fmt into _format, converting any S format characters to "XXs", # where XX is the length of the struct type's packed representation. _format = "" laststructindex = 0 for i in range(len(fmt)): if fmt[i] == "S": # Nested struct. Record the index in our struct it should go into. index = CalcNumElements(fmt[:i]) _nested[index] = substructs[laststructindex] laststructindex += 1 _format += "%ds" % len(_nested[index]) elif fmt[i] == "A": # Null-terminated ASCII string. index = CalcNumElements(fmt[:i]) _asciiz.add(index) _format += "s" else: # Standard struct format character. _format += fmt[i] _length = CalcSize(_format) offset_list = [0] last_offset = 0 for i in range(len(_format)): offset = CalcSize(_format[:i]) if offset > last_offset: last_offset = offset offset_list.append(offset) # A dictionary that maps field names to their offsets in the struct. _offsets = dict(list(zip(_fieldnames, offset_list))) # Check that the number of field names matches the number of fields. numfields = len(struct.unpack(_format, b"\x00" * _length)) if len(_fieldnames) != numfields: raise ValueError("Invalid cstruct: \"%s\" has %d elements, \"%s\" has %d." % (fmt, numfields, fieldnames, len(_fieldnames))) def _SetValues(self, values): # Replace self._values with the given list. We can't do direct assignment # because of the __setattr__ overload on this class. super(CStruct, self).__setattr__("_values", list(values)) def _Parse(self, data): data = data[:self._length] values = list(struct.unpack(self._format, data)) for index, value in enumerate(values): if isinstance(value, str) and index in self._nested: values[index] = self._nested[index](value) self._SetValues(values) def __init__(self, tuple_or_bytes=None, **kwargs): """Construct an instance of this Struct. 1. With no args, the whole struct is zero-initialized. 2. With keyword args, the matching fields are populated; rest are zeroed. 3. With one tuple as the arg, the fields are assigned based on position. 4. With one string arg, the Struct is parsed from bytes. """ if tuple_or_bytes and kwargs: raise TypeError( "%s: cannot specify both a tuple and keyword args" % self._name) if tuple_or_bytes is None: # Default construct from null bytes. self._Parse("\x00" * len(self)) # If any keywords were supplied, set those fields. for k, v in kwargs.items(): setattr(self, k, v) elif isinstance(tuple_or_bytes, str): # Initializing from a string. if len(tuple_or_bytes) < self._length: raise TypeError("%s requires string of length %d, got %d" % (self._name, self._length, len(tuple_or_bytes))) self._Parse(tuple_or_bytes) else: # Initializing from a tuple. if len(tuple_or_bytes) != len(self._fieldnames): raise TypeError("%s has exactly %d fieldnames (%d given)" % (self._name, len(self._fieldnames), len(tuple_or_bytes))) self._SetValues(tuple_or_bytes) def _FieldIndex(self, attr): try: return self._fieldnames.index(attr) except ValueError: raise AttributeError("'%s' has no attribute '%s'" % (self._name, attr)) def __getattr__(self, name): return self._values[self._FieldIndex(name)] def __setattr__(self, name, value): # TODO: check value type against self._format and throw here, or else # callers get an unhelpful exception when they call Pack(). self._values[self._FieldIndex(name)] = value def offset(self, name): if "." in name: raise NotImplementedError("offset() on nested field") return self._offsets[name] @classmethod def __len__(cls): return cls._length def __ne__(self, other): return not self.__eq__(other) def __eq__(self, other): return (isinstance(other, self.__class__) and self._name == other._name and self._fieldnames == other._fieldnames and self._values == other._values) @staticmethod def _MaybePackStruct(value): if hasattr(value, "__metaclass__"):# and value.__metaclass__ == Meta: return value.Pack() else: return value def Pack(self): values = [self._MaybePackStruct(v) for v in self._values] return struct.pack(self._format, *values) def __str__(self): def FieldDesc(index, name, value): if isinstance(value, str): if index in self._asciiz: value = value.rstrip("\x00") elif any(c not in string.printable for c in value): value = value.encode("hex") return "%s=%s" % (name, value) descriptions = [ FieldDesc(i, n, v) for i, (n, v) in enumerate(zip(self._fieldnames, self._values))] return "%s(%s)" % (self._name, ", ".join(descriptions)) def __repr__(self): return str(self) def CPointer(self): """Returns a C pointer to the serialized structure.""" buf = ctypes.create_string_buffer(self.Pack()) # Store the C buffer in the object so it doesn't get garbage collected. super(CStruct, self).__setattr__("_buffer", buf) return ctypes.addressof(self._buffer) return CStruct BpfAttrProgLoad = Struct( "bpf_attr_prog_load", "=IIQQIIQI", "prog_type insn_cnt insns" " license log_level log_size log_buf kern_version") BpfAttrProgAttach = Struct( "bpf_attr_prog_attach", "=III", "target_fd attach_bpf_fd attach_type") BpfInsn = Struct("bpf_insn", "=BBhi", "code dst_src_reg off imm") libc = ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True) def VoidPointer(s): return ctypes.cast(s.CPointer(), ctypes.c_void_p) def MaybeRaiseSocketError(ret): if ret < 0: errno = ctypes.get_errno() raise socket.error(errno, os.strerror(errno)) def BpfSyscall(op, attr): ret = libc.syscall(__NR_bpf, op, VoidPointer(attr), len(attr)) MaybeRaiseSocketError(ret) return ret def BpfProgLoad(prog_type, instructions, prog_license=b"GPL"): bpf_prog = b"".join(instructions) insn_buff = ctypes.create_string_buffer(bpf_prog) gpl_license = ctypes.create_string_buffer(prog_license) log_buf = ctypes.create_string_buffer(b"", LOG_SIZE) return BpfSyscall(BPF_PROG_LOAD, BpfAttrProgLoad((prog_type, len(instructions), ctypes.addressof(insn_buff), ctypes.addressof(gpl_license), LOG_LEVEL, LOG_SIZE, ctypes.addressof(log_buf), 0))) # Attach a eBPF filter to a cgroup def BpfProgAttach(prog_fd, target_fd, prog_type): attr = BpfAttrProgAttach((target_fd, prog_fd, prog_type)) return BpfSyscall(BPF_PROG_ATTACH, attr) # Detach a eBPF filter from a cgroup def BpfProgDetach(target_fd, prog_type): attr = BpfAttrProgAttach((target_fd, 0, prog_type)) return BpfSyscall(BPF_PROG_DETACH, attr) class BpfCgroupEgressTest(unittest.TestCase): def setUp(self): self._cg_fd = os.open("/sys/fs/cgroup", os.O_DIRECTORY | os.O_RDONLY) BpfProgAttach(BpfProgLoad(BPF_PROG_TYPE_CGROUP_SKB, [ BpfInsn((BPF_ALU64 | BPF_MOV | BPF_K, BPF_REG_0, 0, 0)).Pack(), # Mov64Imm(REG0, 0) BpfInsn((BPF_JMP | BPF_EXIT, 0, 0, 0)).Pack() # Exit ]), self._cg_fd, BPF_CGROUP_INET_EGRESS) def tearDown(self): BpfProgDetach(self._cg_fd, BPF_CGROUP_INET_EGRESS) os.close(self._cg_fd) def testCgroupEgressBlocked(self): s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, 0) s.bind(("127.0.0.1", 0)) addr = s.getsockname() self.assertRaisesRegexp(EnvironmentError, os.strerror(errno.EPERM), s.sendto, b"foo", addr) if __name__ == "__main__": unittest.main() ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3 2022-06-15 17:55 ` sdf @ 2022-06-15 20:26 ` sdf 2022-06-15 20:32 ` sdf 0 siblings, 1 reply; 13+ messages in thread From: sdf @ 2022-06-15 20:26 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List, Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas, zhuyifei On 06/15, sdf@google.com wrote: > On 06/15, Maciej Żenczykowski wrote: > > On Wed, Jun 15, 2022 at 10:38 AM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Wed, Jun 15, 2022 at 9:57 AM Maciej Żenczykowski <maze@google.com> > > wrote: > > > > > > > > > > I've confirmed vanilla 5.18.0 is broken, and all it takes is > > > > > cherrypicking that specific stable 5.18.x patch [ > > > > > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour. > > > ... > > > > b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added > > EFAULT in 5.16 > > > > > > There are no such sha-s in the upstream kernel. > > > Sorry we cannot help with debugging of android kernels. > > Yes, sdf@ quoted the wrong sha1, it's a clean cherrypick to an > > internal branch of > > 'bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return > > value' > > commit b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93. > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.16.y&id=b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93 > > Anyway, I think it's unrelated - or at least not the immediate root > cause. > > Also there's *no* Android kernels involved here. > > This is the android net tests failing on vanilla 5.18 and passing on > > 5.18.3. > Yeah, sorry, didn't mean to send those outside :-) > Attached un-android-ified testcase. Passes on bpf-next, trying to see > what happens on vanilla 5.18. Will update once I get more data.. I've bisected the original issue to: b44123b4a3dc ("bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return value") And I believe it's these two lines from the original patch: #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func) \ ({ \ @@ -1398,10 +1398,12 @@ out: u32 _ret; \ _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \ _cn = _flags & BPF_RET_SET_CN; \ + if (_ret && !IS_ERR_VALUE((long)_ret)) \ + _ret = -EFAULT; _ret is u32 and ret gets -1 (ffffffff). IS_ERR_VALUE((long)ffffffff) returns false in this case because it doesn't sign-expand the argument and internally does ffff_ffff >= ffff_ffff_ffff_f001 comparison. I'll try to see what I've changed in my unrelated patch to fix it. But I think we should audit all these IS_ERR_VALUE((long)_ret) regardless; they don't seem to work the way we want them to... ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3 2022-06-15 20:26 ` sdf @ 2022-06-15 20:32 ` sdf 2022-06-15 20:39 ` sdf 0 siblings, 1 reply; 13+ messages in thread From: sdf @ 2022-06-15 20:32 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List, Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas, zhuyifei On 06/15, Stanislav Fomichev wrote: > On 06/15, sdf@google.com wrote: > > On 06/15, Maciej Żenczykowski wrote: > > > On Wed, Jun 15, 2022 at 10:38 AM Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > On Wed, Jun 15, 2022 at 9:57 AM Maciej Żenczykowski > <maze@google.com> > > > wrote: > > > > > > > > > > > > I've confirmed vanilla 5.18.0 is broken, and all it takes is > > > > > > cherrypicking that specific stable 5.18.x patch [ > > > > > > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour. > > > > ... > > > > > b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added > > > EFAULT in 5.16 > > > > > > > > There are no such sha-s in the upstream kernel. > > > > Sorry we cannot help with debugging of android kernels. > > > > > Yes, sdf@ quoted the wrong sha1, it's a clean cherrypick to an > > > internal branch of > > > 'bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall > return > > > value' > > > commit b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93. > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.16.y&id=b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93 > > > > > Anyway, I think it's unrelated - or at least not the immediate root > cause. > > > > > Also there's *no* Android kernels involved here. > > > This is the android net tests failing on vanilla 5.18 and passing on > > > 5.18.3. > > > > Yeah, sorry, didn't mean to send those outside :-) > > > > Attached un-android-ified testcase. Passes on bpf-next, trying to see > > what happens on vanilla 5.18. Will update once I get more data.. > I've bisected the original issue to: > b44123b4a3dc ("bpf: Add cgroup helpers bpf_{get,set}_retval to get/set > syscall return value") > And I believe it's these two lines from the original patch: > #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func) \ > ({ \ > @@ -1398,10 +1398,12 @@ out: > u32 _ret; \ > _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \ > _cn = _flags & BPF_RET_SET_CN; \ > + if (_ret && !IS_ERR_VALUE((long)_ret)) \ > + _ret = -EFAULT; > _ret is u32 and ret gets -1 (ffffffff). IS_ERR_VALUE((long)ffffffff) > returns > false in this case because it doesn't sign-expand the argument and > internally > does ffff_ffff >= ffff_ffff_ffff_f001 comparison. > I'll try to see what I've changed in my unrelated patch to fix it. But I > think > we should audit all these IS_ERR_VALUE((long)_ret) regardless; they don't > seem to work the way we want them to... Ok, and my patch fixes it because I'm replacing 'u32 _ret' with 'int ret'. So, basically, with u32 _ret we have to do IS_ERR_VALUE((long)(int)_ret). Sigh.. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3 2022-06-15 20:32 ` sdf @ 2022-06-15 20:39 ` sdf 2022-06-16 1:36 ` Maciej Żenczykowski 0 siblings, 1 reply; 13+ messages in thread From: sdf @ 2022-06-15 20:39 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List, Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas, zhuyifei On 06/15, Stanislav Fomichev wrote: > On 06/15, Stanislav Fomichev wrote: > > On 06/15, sdf@google.com wrote: > > > On 06/15, Maciej Żenczykowski wrote: > > > > On Wed, Jun 15, 2022 at 10:38 AM Alexei Starovoitov > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > > On Wed, Jun 15, 2022 at 9:57 AM Maciej Żenczykowski > <maze@google.com> > > > > wrote: > > > > > > > > > > > > > > I've confirmed vanilla 5.18.0 is broken, and all it takes is > > > > > > > cherrypicking that specific stable 5.18.x patch [ > > > > > > > 710a8989b4b4067903f5b61314eda491667b6ab3 ] to fix behaviour. > > > > > ... > > > > > > b8bd3ee1971d1edbc53cf322c149ca0227472e56 this is where we added > > > > EFAULT in 5.16 > > > > > > > > > > There are no such sha-s in the upstream kernel. > > > > > Sorry we cannot help with debugging of android kernels. > > > > > > > Yes, sdf@ quoted the wrong sha1, it's a clean cherrypick to an > > > > internal branch of > > > > 'bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall > return > > > > value' > > > > commit b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93. > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.16.y&id=b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93 > > > > > > > Anyway, I think it's unrelated - or at least not the immediate root > cause. > > > > > > > Also there's *no* Android kernels involved here. > > > > This is the android net tests failing on vanilla 5.18 and passing on > > > > 5.18.3. > > > > > > Yeah, sorry, didn't mean to send those outside :-) > > > > > > Attached un-android-ified testcase. Passes on bpf-next, trying to see > > > what happens on vanilla 5.18. Will update once I get more data.. > > > > I've bisected the original issue to: > > > > b44123b4a3dc ("bpf: Add cgroup helpers bpf_{get,set}_retval to get/set > > syscall return value") > > > > And I believe it's these two lines from the original patch: > > > > #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func) \ > > ({ \ > > @@ -1398,10 +1398,12 @@ out: > > u32 _ret; \ > > _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \ > > _cn = _flags & BPF_RET_SET_CN; \ > > + if (_ret && !IS_ERR_VALUE((long)_ret)) \ > > + _ret = -EFAULT; > > > > _ret is u32 and ret gets -1 (ffffffff). IS_ERR_VALUE((long)ffffffff) > returns > > false in this case because it doesn't sign-expand the argument and > internally > > does ffff_ffff >= ffff_ffff_ffff_f001 comparison. > > > > I'll try to see what I've changed in my unrelated patch to fix it. But > I think > > we should audit all these IS_ERR_VALUE((long)_ret) regardless; they > don't > > seem to work the way we want them to... > Ok, and my patch fixes it because I'm replacing 'u32 _ret' with 'int ret'. > So, basically, with u32 _ret we have to do IS_ERR_VALUE((long)(int)_ret). > Sigh.. And to follow up on that, the other two places we have are fine: IS_ERR_VALUE((long)run_ctx.retval)) run_ctx.retval is an int. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3 2022-06-15 20:39 ` sdf @ 2022-06-16 1:36 ` Maciej Żenczykowski 2022-06-16 15:57 ` Stanislav Fomichev 0 siblings, 1 reply; 13+ messages in thread From: Maciej Żenczykowski @ 2022-06-16 1:36 UTC (permalink / raw) To: Stanislav Fomichev Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List, Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas, YiFei Zhu > > > I've bisected the original issue to: > > > > > > b44123b4a3dc ("bpf: Add cgroup helpers bpf_{get,set}_retval to get/set > > > syscall return value") > > > > > > And I believe it's these two lines from the original patch: > > > > > > #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func) \ > > > ({ \ > > > @@ -1398,10 +1398,12 @@ out: > > > u32 _ret; \ > > > _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \ > > > _cn = _flags & BPF_RET_SET_CN; \ > > > + if (_ret && !IS_ERR_VALUE((long)_ret)) \ > > > + _ret = -EFAULT; > > > > > > _ret is u32 and ret gets -1 (ffffffff). IS_ERR_VALUE((long)ffffffff) > > returns > > > false in this case because it doesn't sign-expand the argument and > > internally > > > does ffff_ffff >= ffff_ffff_ffff_f001 comparison. > > > > > > I'll try to see what I've changed in my unrelated patch to fix it. But > > I think > > > we should audit all these IS_ERR_VALUE((long)_ret) regardless; they > > don't > > > seem to work the way we want them to... > > > Ok, and my patch fixes it because I'm replacing 'u32 _ret' with 'int ret'. > > > So, basically, with u32 _ret we have to do IS_ERR_VALUE((long)(int)_ret). > > > Sigh.. > > And to follow up on that, the other two places we have are fine: > > IS_ERR_VALUE((long)run_ctx.retval)) > > run_ctx.retval is an int. I'm guessing this means the regression only affects 64-bit archs, where long = void* is 8 bytes > u32 of 4 bytes, but not 32-bit ones, where long = u32 = 4 bytes Unfortunately my dev machine's 32-bit build capability has somehow regressed again and I can't check this. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3 2022-06-16 1:36 ` Maciej Żenczykowski @ 2022-06-16 15:57 ` Stanislav Fomichev 2022-06-16 16:41 ` Maciej Żenczykowski 0 siblings, 1 reply; 13+ messages in thread From: Stanislav Fomichev @ 2022-06-16 15:57 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List, Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas, YiFei Zhu On Wed, Jun 15, 2022 at 6:36 PM Maciej Żenczykowski <maze@google.com> wrote: > > > > > I've bisected the original issue to: > > > > > > > > b44123b4a3dc ("bpf: Add cgroup helpers bpf_{get,set}_retval to get/set > > > > syscall return value") > > > > > > > > And I believe it's these two lines from the original patch: > > > > > > > > #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func) \ > > > > ({ \ > > > > @@ -1398,10 +1398,12 @@ out: > > > > u32 _ret; \ > > > > _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \ > > > > _cn = _flags & BPF_RET_SET_CN; \ > > > > + if (_ret && !IS_ERR_VALUE((long)_ret)) \ > > > > + _ret = -EFAULT; > > > > > > > > _ret is u32 and ret gets -1 (ffffffff). IS_ERR_VALUE((long)ffffffff) > > > returns > > > > false in this case because it doesn't sign-expand the argument and > > > internally > > > > does ffff_ffff >= ffff_ffff_ffff_f001 comparison. > > > > > > > > I'll try to see what I've changed in my unrelated patch to fix it. But > > > I think > > > > we should audit all these IS_ERR_VALUE((long)_ret) regardless; they > > > don't > > > > seem to work the way we want them to... > > > > > Ok, and my patch fixes it because I'm replacing 'u32 _ret' with 'int ret'. > > > > > So, basically, with u32 _ret we have to do IS_ERR_VALUE((long)(int)_ret). > > > > > Sigh.. > > > > And to follow up on that, the other two places we have are fine: > > > > IS_ERR_VALUE((long)run_ctx.retval)) > > > > run_ctx.retval is an int. > > I'm guessing this means the regression only affects 64-bit archs, > where long = void* is 8 bytes > u32 of 4 bytes, but not 32-bit ones, > where long = u32 = 4 bytes > > Unfortunately my dev machine's 32-bit build capability has somehow > regressed again and I can't check this. Seems so, yes. But I'm actually not sure whether we should at all treat it as a regression. There is a question of whether that EPERM is UAPI or not. That's why we most likely haven't caught it in the selftests; most of the time we only check that syscall has returned -1 and don't pay attention to the particular errno. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3 2022-06-16 15:57 ` Stanislav Fomichev @ 2022-06-16 16:41 ` Maciej Żenczykowski 2022-06-16 17:39 ` Stanislav Fomichev 2022-06-16 21:21 ` YiFei Zhu 0 siblings, 2 replies; 13+ messages in thread From: Maciej Żenczykowski @ 2022-06-16 16:41 UTC (permalink / raw) To: Stanislav Fomichev Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List, Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas, YiFei Zhu On Thu, Jun 16, 2022 at 8:57 AM Stanislav Fomichev <sdf@google.com> wrote: > On Wed, Jun 15, 2022 at 6:36 PM Maciej Żenczykowski <maze@google.com> wrote: > > I'm guessing this means the regression only affects 64-bit archs, > > where long = void* is 8 bytes > u32 of 4 bytes, but not 32-bit ones, > > where long = u32 = 4 bytes > > > > Unfortunately my dev machine's 32-bit build capability has somehow > > regressed again and I can't check this. > > Seems so, yes. But I'm actually not sure whether we should at all > treat it as a regression. There is a question of whether that EPERM is > UAPI or not. That's why we most likely haven't caught it in the > selftests; most of the time we only check that syscall has returned -1 > and don't pay attention to the particular errno. EFAULT seems like a terrible error to return no matter what, it has a very clear 'memory read/write access violation' semantic (ie. if you'd done from userspace you'd get a SIGSEGV) I'm actually surprised to learn you return EFAULT on positive number... It should rather be some unique error code or EINVAL or something. I know someone will argue that (most/all) system calls can return EFAULT... But that's not actually true. From a userspace developer the expectation is they will not return EFAULT if you pass in memory you know is good. #include <sys/utsname.h> int main() { struct utsname uts; uname(&uts); } The above cannot EFAULT in spite of it being documented as the only error uname can report, because obviously the uts structure on the stack is valid memory. Maybe ENOSYS would at least make it obvious something is very weird. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3 2022-06-16 16:41 ` Maciej Żenczykowski @ 2022-06-16 17:39 ` Stanislav Fomichev 2022-06-16 21:21 ` YiFei Zhu 1 sibling, 0 replies; 13+ messages in thread From: Stanislav Fomichev @ 2022-06-16 17:39 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Alexei Starovoitov, Linux NetDev, BPF Mailing List, Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas, YiFei Zhu On Thu, Jun 16, 2022 at 9:41 AM Maciej Żenczykowski <maze@google.com> wrote: > > On Thu, Jun 16, 2022 at 8:57 AM Stanislav Fomichev <sdf@google.com> wrote: > > On Wed, Jun 15, 2022 at 6:36 PM Maciej Żenczykowski <maze@google.com> wrote: > > > I'm guessing this means the regression only affects 64-bit archs, > > > where long = void* is 8 bytes > u32 of 4 bytes, but not 32-bit ones, > > > where long = u32 = 4 bytes > > > > > > Unfortunately my dev machine's 32-bit build capability has somehow > > > regressed again and I can't check this. > > > > Seems so, yes. But I'm actually not sure whether we should at all > > treat it as a regression. There is a question of whether that EPERM is > > UAPI or not. That's why we most likely haven't caught it in the > > selftests; most of the time we only check that syscall has returned -1 > > and don't pay attention to the particular errno. > > EFAULT seems like a terrible error to return no matter what, it has a very clear > 'memory read/write access violation' semantic (ie. if you'd done from > userspace you'd get a SIGSEGV) > > I'm actually surprised to learn you return EFAULT on positive number... > It should rather be some unique error code or EINVAL or something. > > I know someone will argue that (most/all) system calls can return EFAULT... > But that's not actually true. From a userspace developer the expectation is > they will not return EFAULT if you pass in memory you know is good. > > #include <sys/utsname.h> > int main() { > struct utsname uts; > uname(&uts); > } > > The above cannot EFAULT in spite of it being documented as the only > error uname can report, > because obviously the uts structure on the stack is valid memory. > > Maybe ENOSYS would at least make it obvious something is very weird. I'd like to see less of the applications poking into errno and making some decisions based on that. IMO, the only time where it makes sense is EINTR/EAGAIN vs the rest; the rest should be logged. Having errnos hard-coded in the tests is fine to make sure that the condition you're testing against has been triggered; but still treating them as UAPI might be too much, idk. We had to add bpf_set_retval() because some of our and third party libraries would upgrade to v6/v4 sockets only when they receive some specific errno, which doesn't make a lot of sense to me. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Curious bpf regression in 5.18 already fixed in stable 5.18.3 2022-06-16 16:41 ` Maciej Żenczykowski 2022-06-16 17:39 ` Stanislav Fomichev @ 2022-06-16 21:21 ` YiFei Zhu 1 sibling, 0 replies; 13+ messages in thread From: YiFei Zhu @ 2022-06-16 21:21 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Stanislav Fomichev, Alexei Starovoitov, Linux NetDev, BPF Mailing List, Alexei Starovoitov, Martin KaFai Lau, Sasha Levin, Carlos Llamas On Thu, Jun 16, 2022 at 9:41 AM Maciej Żenczykowski <maze@google.com> wrote: > > On Thu, Jun 16, 2022 at 8:57 AM Stanislav Fomichev <sdf@google.com> wrote: > > On Wed, Jun 15, 2022 at 6:36 PM Maciej Żenczykowski <maze@google.com> wrote: > > > I'm guessing this means the regression only affects 64-bit archs, > > > where long = void* is 8 bytes > u32 of 4 bytes, but not 32-bit ones, > > > where long = u32 = 4 bytes > > > > > > Unfortunately my dev machine's 32-bit build capability has somehow > > > regressed again and I can't check this. > > > > Seems so, yes. But I'm actually not sure whether we should at all > > treat it as a regression. There is a question of whether that EPERM is > > UAPI or not. That's why we most likely haven't caught it in the > > selftests; most of the time we only check that syscall has returned -1 > > and don't pay attention to the particular errno. > > EFAULT seems like a terrible error to return no matter what, it has a very clear > 'memory read/write access violation' semantic (ie. if you'd done from > userspace you'd get a SIGSEGV) I chose EFAULT because the original code of getsockopt hook returns -EFAULT if the retval is set to a number that isn't zero or the original value. i.e., in c4dcfdd406aa^, there was: /* BPF programs only allowed to set retval to 0, not some * arbitrary value. */ if (ctx.retval != 0 && ctx.retval != retval) { ret = -EFAULT; goto out; } I understood that as the convention that if a BPF program does something illegal at runtime, return -EFAULT. > I'm actually surprised to learn you return EFAULT on positive number... > It should rather be some unique error code or EINVAL or something. > > I know someone will argue that (most/all) system calls can return EFAULT... > But that's not actually true. From a userspace developer the expectation is > they will not return EFAULT if you pass in memory you know is good. > > #include <sys/utsname.h> > int main() { > struct utsname uts; > uname(&uts); > } > > The above cannot EFAULT in spite of it being documented as the only > error uname can report, > because obviously the uts structure on the stack is valid memory. > > Maybe ENOSYS would at least make it obvious something is very weird. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-06-16 21:22 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-06-15 16:45 Curious bpf regression in 5.18 already fixed in stable 5.18.3 Maciej Żenczykowski 2022-06-15 16:57 ` Maciej Żenczykowski 2022-06-15 17:37 ` Alexei Starovoitov 2022-06-15 17:46 ` Maciej Żenczykowski 2022-06-15 17:55 ` sdf 2022-06-15 20:26 ` sdf 2022-06-15 20:32 ` sdf 2022-06-15 20:39 ` sdf 2022-06-16 1:36 ` Maciej Żenczykowski 2022-06-16 15:57 ` Stanislav Fomichev 2022-06-16 16:41 ` Maciej Żenczykowski 2022-06-16 17:39 ` Stanislav Fomichev 2022-06-16 21:21 ` YiFei Zhu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).