* Re: [f2fs-dev] [PATCH 3/6] f2fs: do not expose unwritten blocks to user by DIO
From: Chao Yu @ 2022-01-05 13:19 UTC (permalink / raw)
To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel
In-Reply-To: <20220104212419.1879225-3-jaegeuk@kernel.org>
On 2022/1/5 5:24, Jaegeuk Kim wrote:
> DIO preallocates physical blocks before writing data, but if an error occurrs
> or power-cut happens, we can see block contents from the disk. This patch tries
> to fix it by 1) turning to buffered writes for DIO into holes, 2) truncating
> unwritten blocks from error or power-cut.
>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Thanks,
^ permalink raw reply
* Re: [PATCH net] ppp: ensure minimum packet size in ppp_write()
From: Guillaume Nault @ 2022-01-05 13:19 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, netdev, Eric Dumazet,
Paul Mackerras, linux-ppp, syzbot
In-Reply-To: <20220105114842.2380951-1-eric.dumazet@gmail.com>
On Wed, Jan 05, 2022 at 03:48:42AM -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> It seems pretty clear ppp layer assumed user space
> would always be kind to provide enough data
> in their write() to a ppp device.
>
> This patch makes sure user provides at least
> 2 bytes.
>
> It adds PPP_PROTO_LEN macro that could replace
> in net-next many occurrences of hard-coded 2 value.
The PPP header can be compressed to only 1 byte, but since 2 bytes is
assumed in several parts of the code, rejecting such packets in
ppp_xmit() is probably the best we can do.
Acked-by: Guillaume Nault <gnault@redhat.com>
^ permalink raw reply
* Re: [f2fs-dev] [PATCH 3/6] f2fs: do not expose unwritten blocks to user by DIO
From: Chao Yu @ 2022-01-05 13:19 UTC (permalink / raw)
To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel
In-Reply-To: <20220104212419.1879225-3-jaegeuk@kernel.org>
On 2022/1/5 5:24, Jaegeuk Kim wrote:
> DIO preallocates physical blocks before writing data, but if an error occurrs
> or power-cut happens, we can see block contents from the disk. This patch tries
> to fix it by 1) turning to buffered writes for DIO into holes, 2) truncating
> unwritten blocks from error or power-cut.
>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Thanks,
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply
* [PATCH nf] netfilter: nft_set_pipapo: allocate pcpu scratch maps on clone
From: Florian Westphal @ 2022-01-05 13:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: sbrivio, Florian Westphal, etkaar
This is needed in case a new transaction is made that doesn't insert any
new elements into an already existing set.
Else, after second 'nft -f ruleset.txt', lookups in such a set will fail
because ->lookup() encounters raw_cpu_ptr(m->scratch) == NULL.
For the initial rule load, insertion of elements takes care of the
allocation, but for rule reloads this isn't guaranteed: we might not
have additions to the set.
Fixes: 3c4287f62044a90e ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reported-by: etkaar <lists.netfilter.org@prvy.eu>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
Change since test patch:
handle 'out_scratch_realloc' error handling to free scratch maps
on errors.
net/netfilter/nft_set_pipapo.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index dce866d93fee..2c8051d8cca6 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1290,6 +1290,11 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
if (!new->scratch_aligned)
goto out_scratch;
#endif
+ for_each_possible_cpu(i)
+ *per_cpu_ptr(new->scratch, i) = NULL;
+
+ if (pipapo_realloc_scratch(new, old->bsize_max))
+ goto out_scratch_realloc;
rcu_head_init(&new->rcu);
@@ -1334,6 +1339,9 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
kvfree(dst->lt);
dst--;
}
+out_scratch_realloc:
+ for_each_possible_cpu(i)
+ kfree(*per_cpu_ptr(new->scratch, i));
#ifdef NFT_PIPAPO_ALIGN
free_percpu(new->scratch_aligned);
#endif
--
2.34.1
^ permalink raw reply related
* [PATCH RFT] net: asix: add proper error handling of usb read errors
From: Pavel Skripkin @ 2022-01-05 13:19 UTC (permalink / raw)
To: davem, kuba, linux, andrew, oneukum, robert.foss, freddy
Cc: linux-usb, netdev, linux-kernel, Pavel Skripkin,
syzbot+6ca9f7867b77c2d316ac
Syzbot once again hit uninit value in asix driver. The problem still the
same -- asix_read_cmd() reads less bytes, than was requested by caller.
Since all read requests are performed via asix_read_cmd() let's catch
usb related error there and add __must_check notation to be sure all
callers actually check return value.
So, this patch adds sanity check inside asix_read_cmd(), that simply
checks if bytes read are not less, than was requested and adds missing
error handling of asix_read_cmd() all across the driver code.
Fixes: d9fe64e51114 ("net: asix: Add in_pm parameter")
Reported-and-tested-by: syzbot+6ca9f7867b77c2d316ac@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
---
drivers/net/usb/asix.h | 4 ++--
drivers/net/usb/asix_common.c | 19 +++++++++++++------
drivers/net/usb/asix_devices.c | 21 ++++++++++++++++++---
3 files changed, 33 insertions(+), 11 deletions(-)
diff --git a/drivers/net/usb/asix.h b/drivers/net/usb/asix.h
index 2a1e31defe71..4334aafab59a 100644
--- a/drivers/net/usb/asix.h
+++ b/drivers/net/usb/asix.h
@@ -192,8 +192,8 @@ extern const struct driver_info ax88172a_info;
/* ASIX specific flags */
#define FLAG_EEPROM_MAC (1UL << 0) /* init device MAC from eeprom */
-int asix_read_cmd(struct usbnet *dev, u8 cmd, u16 value, u16 index,
- u16 size, void *data, int in_pm);
+int __must_check asix_read_cmd(struct usbnet *dev, u8 cmd, u16 value, u16 index,
+ u16 size, void *data, int in_pm);
int asix_write_cmd(struct usbnet *dev, u8 cmd, u16 value, u16 index,
u16 size, void *data, int in_pm);
diff --git a/drivers/net/usb/asix_common.c b/drivers/net/usb/asix_common.c
index 71682970be58..524805285019 100644
--- a/drivers/net/usb/asix_common.c
+++ b/drivers/net/usb/asix_common.c
@@ -11,8 +11,8 @@
#define AX_HOST_EN_RETRIES 30
-int asix_read_cmd(struct usbnet *dev, u8 cmd, u16 value, u16 index,
- u16 size, void *data, int in_pm)
+int __must_check asix_read_cmd(struct usbnet *dev, u8 cmd, u16 value, u16 index,
+ u16 size, void *data, int in_pm)
{
int ret;
int (*fn)(struct usbnet *, u8, u8, u16, u16, void *, u16);
@@ -27,9 +27,12 @@ int asix_read_cmd(struct usbnet *dev, u8 cmd, u16 value, u16 index,
ret = fn(dev, cmd, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE,
value, index, data, size);
- if (unlikely(ret < 0))
+ if (unlikely(ret < size)) {
+ ret = ret < 0 ? ret : -ENODATA;
+
netdev_warn(dev->net, "Failed to read reg index 0x%04x: %d\n",
index, ret);
+ }
return ret;
}
@@ -79,7 +82,7 @@ static int asix_check_host_enable(struct usbnet *dev, int in_pm)
0, 0, 1, &smsr, in_pm);
if (ret == -ENODEV)
break;
- else if (ret < sizeof(smsr))
+ else if (ret < 0)
continue;
else if (smsr & AX_HOST_EN)
break;
@@ -579,8 +582,12 @@ int asix_mdio_read_nopm(struct net_device *netdev, int phy_id, int loc)
return ret;
}
- asix_read_cmd(dev, AX_CMD_READ_MII_REG, phy_id,
- (__u16)loc, 2, &res, 1);
+ ret = asix_read_cmd(dev, AX_CMD_READ_MII_REG, phy_id,
+ (__u16)loc, 2, &res, 1);
+ if (ret < 0) {
+ mutex_unlock(&dev->phy_mutex);
+ return ret;
+ }
asix_set_hw_mii(dev, 1);
mutex_unlock(&dev->phy_mutex);
diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
index 4514d35ef4c4..6b2fbdf4e0fd 100644
--- a/drivers/net/usb/asix_devices.c
+++ b/drivers/net/usb/asix_devices.c
@@ -755,7 +755,12 @@ static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
priv->phy_addr = ret;
priv->embd_phy = ((priv->phy_addr & 0x1f) == 0x10);
- asix_read_cmd(dev, AX_CMD_STATMNGSTS_REG, 0, 0, 1, &chipcode, 0);
+ ret = asix_read_cmd(dev, AX_CMD_STATMNGSTS_REG, 0, 0, 1, &chipcode, 0);
+ if (ret < 0) {
+ netdev_dbg(dev->net, "Failed to read STATMNGSTS_REG: %d\n", ret);
+ return ret;
+ }
+
chipcode &= AX_CHIPCODE_MASK;
ret = (chipcode == AX_AX88772_CHIPCODE) ? ax88772_hw_reset(dev, 0) :
@@ -920,11 +925,21 @@ static int ax88178_reset(struct usbnet *dev)
int gpio0 = 0;
u32 phyid;
- asix_read_cmd(dev, AX_CMD_READ_GPIOS, 0, 0, 1, &status, 0);
+ ret = asix_read_cmd(dev, AX_CMD_READ_GPIOS, 0, 0, 1, &status, 0);
+ if (ret < 0) {
+ netdev_dbg(dev->net, "Failed to read GPIOS: %d\n", ret);
+ return ret;
+ }
+
netdev_dbg(dev->net, "GPIO Status: 0x%04x\n", status);
asix_write_cmd(dev, AX_CMD_WRITE_ENABLE, 0, 0, 0, NULL, 0);
- asix_read_cmd(dev, AX_CMD_READ_EEPROM, 0x0017, 0, 2, &eeprom, 0);
+ ret = asix_read_cmd(dev, AX_CMD_READ_EEPROM, 0x0017, 0, 2, &eeprom, 0);
+ if (ret < 0) {
+ netdev_dbg(dev->net, "Failed to read EEPROM: %d\n", ret);
+ return ret;
+ }
+
asix_write_cmd(dev, AX_CMD_WRITE_DISABLE, 0, 0, 0, NULL, 0);
netdev_dbg(dev->net, "EEPROM index 0x17 is 0x%04x\n", eeprom);
--
2.34.1
^ permalink raw reply related
* Re: [f2fs-dev] [PATCH 5/6] f2fs: implement iomap operations
From: Chao Yu @ 2022-01-05 13:20 UTC (permalink / raw)
To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel; +Cc: Eric Biggers
In-Reply-To: <20220104212419.1879225-5-jaegeuk@kernel.org>
On 2022/1/5 5:24, Jaegeuk Kim wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> Implement 'struct iomap_ops' for f2fs, in preparation for making f2fs
> use iomap for direct I/O.
>
> Note that this may be used for other things besides direct I/O in the
> future; however, for now I've only tested it for direct I/O.
>
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Thanks,
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply
* Re: [f2fs-dev] [PATCH 5/6] f2fs: implement iomap operations
From: Chao Yu @ 2022-01-05 13:20 UTC (permalink / raw)
To: Jaegeuk Kim, linux-kernel, linux-f2fs-devel; +Cc: Eric Biggers
In-Reply-To: <20220104212419.1879225-5-jaegeuk@kernel.org>
On 2022/1/5 5:24, Jaegeuk Kim wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> Implement 'struct iomap_ops' for f2fs, in preparation for making f2fs
> use iomap for direct I/O.
>
> Note that this may be used for other things besides direct I/O in the
> future; however, for now I've only tested it for direct I/O.
>
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Thanks,
^ permalink raw reply
* Re: [Intel-gfx] [PATCH 2/2] drm/i915/uncore: rename i915_reg_read_ioctl intel_uncore_reg_read_ioctl
From: Jani Nikula @ 2022-01-05 13:18 UTC (permalink / raw)
To: Tvrtko Ursulin, intel-gfx
In-Reply-To: <bee03d1e-06dd-6243-e711-ab8d7c7081bb@linux.intel.com>
On Wed, 05 Jan 2022, Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> On 05/01/2022 10:32, Jani Nikula wrote:
>> On Wed, 05 Jan 2022, Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>>> On 05/01/2022 10:05, Jani Nikula wrote:
>>>> Follow the usual naming convention.
>>>
>>> But intel_uncore_ prefix usually means functions takes intel_uncore as
>>> the first argument.
>>>
>>> Maybe solution here is that i915_reg_read_ioctl does not belong in
>>> intel_uncore.c, it being the UAPI layer thing? I guess arguments could
>>> be made for either way.
>>
>> My position is that the function and file prefixes go hand in
>> hand. You'll always know where to place a function, and you'll always
>> know where the function is to be found.
>>
>> If you can *also* make the context argument follow the pattern, it's
>> obviously better, and indicates the division to files is working out
>> nicely. However, in a lot of cases you'll need to pass struct
>> drm_i915_private or similar as the first parameter to e.g. init
>> functions. It can't be the rigid rule.
>>
>> I'm fine with moving the entire function somewhere else, as long as the
>> declaration is not in i915_drv.h. There's no longer a i915_drv.c, and
>> i915_drv.h should not have function declarations at all.
>
> Yes I agree it cannot be a rigid rule. I just that it feels
> intel_uncore.[hc] is too low level to me to hold an ioctl
> implementation, and header actually feels wrong to have the declaration.
> Not least it is about _one_ of the uncores, while the ioctl is not
> operating on that level, albeit undefined at the moment how exactly it
> would work for multi-tile.
>
> Would it be too early, or unwarranted at this point, to maybe consider
> adding i915_ioctls.[hc]?
Then the conversation would be about putting together a ton of unrelated
functions where the only thing in common is that they're an ioctl
implementation. Arguably many of them would have less in common than the
reg read ioctl has with uncore!
And when is it okay to put an ioctl in the i915_ioctls.c file and when
is it warranted to put it somewhere else? It's just a different set of
problems.
> I like the i915_ prefix of ioctls for consistency.. i915_getparam_ioctl,
> i915_query_ioctl, i915_perf_..., i915_gem_....
The display ioctls have intel_ prefix anyway. It's the _ioctl suffix
that we use.
Again, my main driver here is cleaning up i915_drv.h. I can shove the
reg read ioctl somewhere other than intel_uncore.[ch] too. But as it
stands, the only alternative that seems better than intel_uncore.[ch] at
the moment is adding a dedicated file for a 60-line function.
BR,
Jani.
>
> Regards,
>
> Tvrtko
>
>>
>> BR,
>> Jani.
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
>>>> ---
>>>> drivers/gpu/drm/i915/i915_driver.c | 2 +-
>>>> drivers/gpu/drm/i915/intel_uncore.c | 4 ++--
>>>> drivers/gpu/drm/i915/intel_uncore.h | 4 ++--
>>>> 3 files changed, 5 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>>>> index 95174938b160..f9a494e159dc 100644
>>>> --- a/drivers/gpu/drm/i915/i915_driver.c
>>>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>>>> @@ -1805,7 +1805,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
>>>> DRM_IOCTL_DEF_DRV(I915_GEM_WAIT, i915_gem_wait_ioctl, DRM_RENDER_ALLOW),
>>>> DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE_EXT, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
>>>> DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_RENDER_ALLOW),
>>>> - DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_RENDER_ALLOW),
>>>> + DRM_IOCTL_DEF_DRV(I915_REG_READ, intel_uncore_reg_read_ioctl, DRM_RENDER_ALLOW),
>>>> DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_gem_context_reset_stats_ioctl, DRM_RENDER_ALLOW),
>>>> DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, DRM_RENDER_ALLOW),
>>>> DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
>>>> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>>>> index fc25ebf1a593..33f95bb2d3d5 100644
>>>> --- a/drivers/gpu/drm/i915/intel_uncore.c
>>>> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>>>> @@ -2269,8 +2269,8 @@ static const struct reg_whitelist {
>>>> .size = 8
>>>> } };
>>>>
>>>> -int i915_reg_read_ioctl(struct drm_device *dev,
>>>> - void *data, struct drm_file *file)
>>>> +int intel_uncore_reg_read_ioctl(struct drm_device *dev,
>>>> + void *data, struct drm_file *file)
>>>> {
>>>> struct drm_i915_private *i915 = to_i915(dev);
>>>> struct intel_uncore *uncore = &i915->uncore;
>>>> diff --git a/drivers/gpu/drm/i915/intel_uncore.h b/drivers/gpu/drm/i915/intel_uncore.h
>>>> index 3a87bbd906f8..697ac4586159 100644
>>>> --- a/drivers/gpu/drm/i915/intel_uncore.h
>>>> +++ b/drivers/gpu/drm/i915/intel_uncore.h
>>>> @@ -457,7 +457,7 @@ static inline int intel_uncore_write_and_verify(struct intel_uncore *uncore,
>>>> #define raw_reg_write(base, reg, value) \
>>>> writel(value, base + i915_mmio_reg_offset(reg))
>>>>
>>>> -int i915_reg_read_ioctl(struct drm_device *dev, void *data,
>>>> - struct drm_file *file);
>>>> +int intel_uncore_reg_read_ioctl(struct drm_device *dev, void *data,
>>>> + struct drm_file *file);
>>>>
>>>> #endif /* !__INTEL_UNCORE_H__ */
>>>>
>>
--
Jani Nikula, Intel Open Source Graphics Center
^ permalink raw reply
* [PATCH v3 net-next 0/2] net: bpf: handle return value of post_bind{4,6} and add selftests for it
From: menglong8.dong @ 2022-01-05 13:18 UTC (permalink / raw)
To: kuba, daniel
Cc: davem, yoshfuji, dsahern, edumazet, shuah, ast, andrii, kafai,
songliubraving, yhs, john.fastabend, kpsingh, netdev,
linux-kernel, linux-kselftest, bpf, Menglong Dong
From: Menglong Dong <imagedong@tencent.com>
The return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() in
__inet_bind() is not handled properly. While the return value
is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and
exit:
exit:
err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
if (err) {
inet->inet_saddr = inet->inet_rcv_saddr = 0;
goto out_release_sock;
}
Let's take UDP for example and see what will happen. For UDP
socket, it will be added to 'udp_prot.h.udp_table->hash' and
'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port()
called success. If 'inet->inet_rcv_saddr' is specified here,
then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong
to (because inet_saddr is changed to 0), and UDP packet received
will not be passed to this sock. If 'inet->inet_rcv_saddr' is not
specified here, the sock will work fine, as it can receive packet
properly, which is wired, as the 'bind()' is already failed.
To undo the get_port() operation, introduce the 'put_port' field
for 'struct proto'. For TCP proto, it is inet_put_port(); For UDP
proto, it is udp_lib_unhash(); For icmp proto, it is
ping_unhash().
Therefore, after sys_bind() fail caused by
BPF_CGROUP_RUN_PROG_INET4_POST_BIND(), it will be unbinded, which
means that it can try to be binded to another port.
The second patch is the selftests for this modification.
Changes since v2:
- NULL check for sk->sk_prot->put_port
Changes since v1:
- introduce 'put_port' field for 'struct proto'
- add selftests for it
Menglong Dong (2):
net: bpf: handle return value of
BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND()
bpf: selftests: add bind retry for post_bind{4, 6}
include/net/sock.h | 1 +
net/ipv4/af_inet.c | 2 +
net/ipv4/ping.c | 1 +
net/ipv4/tcp_ipv4.c | 1 +
net/ipv4/udp.c | 1 +
net/ipv6/af_inet6.c | 2 +
net/ipv6/ping.c | 1 +
net/ipv6/tcp_ipv6.c | 1 +
net/ipv6/udp.c | 1 +
tools/testing/selftests/bpf/test_sock.c | 166 +++++++++++++++++++++---
10 files changed, 157 insertions(+), 20 deletions(-)
--
2.27.0
^ permalink raw reply
* [PATCH v3 net-next 1/2] net: bpf: handle return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND()
From: menglong8.dong @ 2022-01-05 13:18 UTC (permalink / raw)
To: kuba, daniel
Cc: davem, yoshfuji, dsahern, edumazet, shuah, ast, andrii, kafai,
songliubraving, yhs, john.fastabend, kpsingh, netdev,
linux-kernel, linux-kselftest, bpf, Menglong Dong
In-Reply-To: <20220105131849.2559506-1-imagedong@tencent.com>
From: Menglong Dong <imagedong@tencent.com>
The return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() in
__inet_bind() is not handled properly. While the return value
is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and
exit:
err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
if (err) {
inet->inet_saddr = inet->inet_rcv_saddr = 0;
goto out_release_sock;
}
Let's take UDP for example and see what will happen. For UDP
socket, it will be added to 'udp_prot.h.udp_table->hash' and
'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port()
called success. If 'inet->inet_rcv_saddr' is specified here,
then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong
to (because inet_saddr is changed to 0), and UDP packet received
will not be passed to this sock. If 'inet->inet_rcv_saddr' is not
specified here, the sock will work fine, as it can receive packet
properly, which is wired, as the 'bind()' is already failed.
To undo the get_port() operation, introduce the 'put_port' field
for 'struct proto'. For TCP proto, it is inet_put_port(); For UDP
proto, it is udp_lib_unhash(); For icmp proto, it is
ping_unhash().
Therefore, after sys_bind() fail caused by
BPF_CGROUP_RUN_PROG_INET4_POST_BIND(), it will be unbinded, which
means that it can try to be binded to another port.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
v3:
- NULL check for sk->sk_prot->put_port
v2:
- introduce 'put_port' field for 'struct proto'
---
include/net/sock.h | 1 +
net/ipv4/af_inet.c | 2 ++
net/ipv4/ping.c | 1 +
net/ipv4/tcp_ipv4.c | 1 +
net/ipv4/udp.c | 1 +
net/ipv6/af_inet6.c | 2 ++
net/ipv6/ping.c | 1 +
net/ipv6/tcp_ipv6.c | 1 +
net/ipv6/udp.c | 1 +
9 files changed, 11 insertions(+)
diff --git a/include/net/sock.h b/include/net/sock.h
index 7b4b4237e6e0..ff9b508d9c5f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1209,6 +1209,7 @@ struct proto {
void (*unhash)(struct sock *sk);
void (*rehash)(struct sock *sk);
int (*get_port)(struct sock *sk, unsigned short snum);
+ void (*put_port)(struct sock *sk);
#ifdef CONFIG_BPF_SYSCALL
int (*psock_update_sk_prot)(struct sock *sk,
struct sk_psock *psock,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index f53184767ee7..9c465bac1eb0 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -531,6 +531,8 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
if (err) {
inet->inet_saddr = inet->inet_rcv_saddr = 0;
+ if (sk->sk_prot->put_port)
+ sk->sk_prot->put_port(sk);
goto out_release_sock;
}
}
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index e540b0dcf085..0e56df3a45e2 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -994,6 +994,7 @@ struct proto ping_prot = {
.hash = ping_hash,
.unhash = ping_unhash,
.get_port = ping_get_port,
+ .put_port = ping_unhash,
.obj_size = sizeof(struct inet_sock),
};
EXPORT_SYMBOL(ping_prot);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ac10e4cdd8d0..9861786b8336 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -3076,6 +3076,7 @@ struct proto tcp_prot = {
.hash = inet_hash,
.unhash = inet_unhash,
.get_port = inet_csk_get_port,
+ .put_port = inet_put_port,
#ifdef CONFIG_BPF_SYSCALL
.psock_update_sk_prot = tcp_bpf_update_proto,
#endif
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 7b18a6f42f18..c2a4411d2b04 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2927,6 +2927,7 @@ struct proto udp_prot = {
.unhash = udp_lib_unhash,
.rehash = udp_v4_rehash,
.get_port = udp_v4_get_port,
+ .put_port = udp_lib_unhash,
#ifdef CONFIG_BPF_SYSCALL
.psock_update_sk_prot = udp_bpf_update_proto,
#endif
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index d1636425654e..8fe7900f1949 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -413,6 +413,8 @@ static int __inet6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
if (err) {
sk->sk_ipv6only = saved_ipv6only;
inet_reset_saddr(sk);
+ if (sk->sk_prot->put_port)
+ sk->sk_prot->put_port(sk);
goto out;
}
}
diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
index 6ac88fe24a8e..9256f6ba87ef 100644
--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -177,6 +177,7 @@ struct proto pingv6_prot = {
.hash = ping_hash,
.unhash = ping_unhash,
.get_port = ping_get_port,
+ .put_port = ping_unhash,
.obj_size = sizeof(struct raw6_sock),
};
EXPORT_SYMBOL_GPL(pingv6_prot);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 1ac243d18c2b..075ee8a2df3b 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -2181,6 +2181,7 @@ struct proto tcpv6_prot = {
.hash = inet6_hash,
.unhash = inet_unhash,
.get_port = inet_csk_get_port,
+ .put_port = inet_put_port,
#ifdef CONFIG_BPF_SYSCALL
.psock_update_sk_prot = tcp_bpf_update_proto,
#endif
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 1accc06abc54..90718a924ca8 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1732,6 +1732,7 @@ struct proto udpv6_prot = {
.unhash = udp_lib_unhash,
.rehash = udp_v6_rehash,
.get_port = udp_v6_get_port,
+ .put_port = udp_lib_unhash,
#ifdef CONFIG_BPF_SYSCALL
.psock_update_sk_prot = udp_bpf_update_proto,
#endif
--
2.30.2
^ permalink raw reply related
* [PATCH v3 net-next 2/2] bpf: selftests: add bind retry for post_bind{4, 6}
From: menglong8.dong @ 2022-01-05 13:18 UTC (permalink / raw)
To: kuba, daniel
Cc: davem, yoshfuji, dsahern, edumazet, shuah, ast, andrii, kafai,
songliubraving, yhs, john.fastabend, kpsingh, netdev,
linux-kernel, linux-kselftest, bpf, Menglong Dong
In-Reply-To: <20220105131849.2559506-1-imagedong@tencent.com>
From: Menglong Dong <imagedong@tencent.com>
With previous patch, kernel is able to 'put_port' after sys_bind()
fails. Add the test for that case: rebind another port after
sys_bind() fails. If the bind success, it means previous bind
operation is already undoed.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
tools/testing/selftests/bpf/test_sock.c | 166 +++++++++++++++++++++---
1 file changed, 146 insertions(+), 20 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_sock.c b/tools/testing/selftests/bpf/test_sock.c
index e8edd3dd3ec2..68525d68d4e5 100644
--- a/tools/testing/selftests/bpf/test_sock.c
+++ b/tools/testing/selftests/bpf/test_sock.c
@@ -35,12 +35,15 @@ struct sock_test {
/* Endpoint to bind() to */
const char *ip;
unsigned short port;
+ unsigned short port_retry;
/* Expected test result */
enum {
LOAD_REJECT,
ATTACH_REJECT,
BIND_REJECT,
SUCCESS,
+ RETRY_SUCCESS,
+ RETRY_REJECT
} result;
};
@@ -60,6 +63,7 @@ static struct sock_test tests[] = {
0,
NULL,
0,
+ 0,
LOAD_REJECT,
},
{
@@ -77,6 +81,7 @@ static struct sock_test tests[] = {
0,
NULL,
0,
+ 0,
LOAD_REJECT,
},
{
@@ -94,6 +99,7 @@ static struct sock_test tests[] = {
0,
NULL,
0,
+ 0,
LOAD_REJECT,
},
{
@@ -111,6 +117,7 @@ static struct sock_test tests[] = {
0,
NULL,
0,
+ 0,
LOAD_REJECT,
},
{
@@ -125,6 +132,7 @@ static struct sock_test tests[] = {
SOCK_STREAM,
"127.0.0.1",
8097,
+ 0,
SUCCESS,
},
{
@@ -139,6 +147,7 @@ static struct sock_test tests[] = {
SOCK_STREAM,
"127.0.0.1",
8097,
+ 0,
SUCCESS,
},
{
@@ -153,6 +162,7 @@ static struct sock_test tests[] = {
0,
NULL,
0,
+ 0,
ATTACH_REJECT,
},
{
@@ -167,6 +177,7 @@ static struct sock_test tests[] = {
0,
NULL,
0,
+ 0,
ATTACH_REJECT,
},
{
@@ -181,6 +192,7 @@ static struct sock_test tests[] = {
0,
NULL,
0,
+ 0,
ATTACH_REJECT,
},
{
@@ -195,6 +207,7 @@ static struct sock_test tests[] = {
0,
NULL,
0,
+ 0,
ATTACH_REJECT,
},
{
@@ -209,6 +222,7 @@ static struct sock_test tests[] = {
SOCK_STREAM,
"0.0.0.0",
0,
+ 0,
BIND_REJECT,
},
{
@@ -223,6 +237,7 @@ static struct sock_test tests[] = {
SOCK_STREAM,
"::",
0,
+ 0,
BIND_REJECT,
},
{
@@ -253,6 +268,7 @@ static struct sock_test tests[] = {
SOCK_STREAM,
"::1",
8193,
+ 0,
BIND_REJECT,
},
{
@@ -283,8 +299,102 @@ static struct sock_test tests[] = {
SOCK_STREAM,
"127.0.0.1",
4098,
+ 0,
SUCCESS,
},
+ {
+ "bind4 deny specific IP & port of TCP, and retry",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+ /* if (ip == expected && port == expected) */
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+ offsetof(struct bpf_sock, src_ip4)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_7,
+ __bpf_constant_ntohl(0x7F000001), 4),
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+ offsetof(struct bpf_sock, src_port)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0x1002, 2),
+
+ /* return DENY; */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_JMP_A(1),
+
+ /* else return ALLOW; */
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_EXIT_INSN(),
+ },
+ BPF_CGROUP_INET4_POST_BIND,
+ BPF_CGROUP_INET4_POST_BIND,
+ AF_INET,
+ SOCK_STREAM,
+ "127.0.0.1",
+ 4098,
+ 5000,
+ RETRY_SUCCESS,
+ },
+ {
+ "bind4 deny specific IP & port of UDP, and retry",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+ /* if (ip == expected && port == expected) */
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+ offsetof(struct bpf_sock, src_ip4)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_7,
+ __bpf_constant_ntohl(0x7F000001), 4),
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+ offsetof(struct bpf_sock, src_port)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0x1002, 2),
+
+ /* return DENY; */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_JMP_A(1),
+
+ /* else return ALLOW; */
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_EXIT_INSN(),
+ },
+ BPF_CGROUP_INET4_POST_BIND,
+ BPF_CGROUP_INET4_POST_BIND,
+ AF_INET,
+ SOCK_DGRAM,
+ "127.0.0.1",
+ 4098,
+ 5000,
+ RETRY_SUCCESS,
+ },
+ {
+ "bind6 deny specific IP & port, and retry",
+ .insns = {
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+ /* if (ip == expected && port == expected) */
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+ offsetof(struct bpf_sock, src_ip6[3])),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_7,
+ __bpf_constant_ntohl(0x00000001), 4),
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+ offsetof(struct bpf_sock, src_port)),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_7, 0x2001, 2),
+
+ /* return DENY; */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_JMP_A(1),
+
+ /* else return ALLOW; */
+ BPF_MOV64_IMM(BPF_REG_0, 1),
+ BPF_EXIT_INSN(),
+ },
+ BPF_CGROUP_INET6_POST_BIND,
+ BPF_CGROUP_INET6_POST_BIND,
+ AF_INET6,
+ SOCK_STREAM,
+ "::1",
+ 8193,
+ 9000,
+ RETRY_SUCCESS,
+ },
{
"bind4 allow all",
.insns = {
@@ -297,6 +407,7 @@ static struct sock_test tests[] = {
SOCK_STREAM,
"0.0.0.0",
0,
+ 0,
SUCCESS,
},
{
@@ -311,6 +422,7 @@ static struct sock_test tests[] = {
SOCK_STREAM,
"::",
0,
+ 0,
SUCCESS,
},
};
@@ -351,14 +463,15 @@ static int attach_sock_prog(int cgfd, int progfd,
return bpf_prog_attach(progfd, cgfd, attach_type, BPF_F_ALLOW_OVERRIDE);
}
-static int bind_sock(int domain, int type, const char *ip, unsigned short port)
+static int bind_sock(int domain, int type, const char *ip,
+ unsigned short port, unsigned short port_retry)
{
struct sockaddr_storage addr;
struct sockaddr_in6 *addr6;
struct sockaddr_in *addr4;
int sockfd = -1;
socklen_t len;
- int err = 0;
+ int res = SUCCESS;
sockfd = socket(domain, type, 0);
if (sockfd < 0)
@@ -384,21 +497,44 @@ static int bind_sock(int domain, int type, const char *ip, unsigned short port)
goto err;
}
- if (bind(sockfd, (const struct sockaddr *)&addr, len) == -1)
- goto err;
+ if (bind(sockfd, (const struct sockaddr *)&addr, len) == -1) {
+ /* sys_bind() may fail for different reasons, errno has to be
+ * checked to confirm that BPF program rejected it.
+ */
+ if (errno != EPERM)
+ goto err;
+ if (port_retry)
+ goto retry;
+ res = BIND_REJECT;
+ goto out;
+ }
+ goto out;
+retry:
+ if (domain == AF_INET)
+ addr4->sin_port = htons(port_retry);
+ else
+ addr6->sin6_port = htons(port_retry);
+ if (bind(sockfd, (const struct sockaddr *)&addr, len) == -1) {
+ if (errno != EPERM)
+ goto err;
+ res = RETRY_REJECT;
+ } else {
+ res = RETRY_SUCCESS;
+ }
goto out;
err:
- err = -1;
+ res = -1;
out:
close(sockfd);
- return err;
+ return res;
}
static int run_test_case(int cgfd, const struct sock_test *test)
{
int progfd = -1;
int err = 0;
+ int res;
printf("Test case: %s .. ", test->descr);
progfd = load_sock_prog(test->insns, test->expected_attach_type);
@@ -416,21 +552,11 @@ static int run_test_case(int cgfd, const struct sock_test *test)
goto err;
}
- if (bind_sock(test->domain, test->type, test->ip, test->port) == -1) {
- /* sys_bind() may fail for different reasons, errno has to be
- * checked to confirm that BPF program rejected it.
- */
- if (test->result == BIND_REJECT && errno == EPERM)
- goto out;
- else
- goto err;
- }
-
+ res = bind_sock(test->domain, test->type, test->ip, test->port,
+ test->port_retry);
+ if (res > 0 && test->result == res)
+ goto out;
- if (test->result != SUCCESS)
- goto err;
-
- goto out;
err:
err = -1;
out:
--
2.30.2
^ permalink raw reply related
* [PATCH v2 net-next 0/7] Cleanup to main DSA structures
From: Vladimir Oltean @ 2022-01-05 13:21 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Jakub Kicinski, Andrew Lunn, Vivien Didelot,
Florian Fainelli
This series contains changes that do the following:
- struct dsa_port reduced from 576 to 544 bytes, and first cache line a
bit better organized
- struct dsa_switch from 160 to 136 bytes, and first cache line a bit
better organized
- struct dsa_switch_tree from 112 to 104 bytes, and first cache line a
bit better organized
No changes compared to v1, just split into a separate patch set.
Vladimir Oltean (7):
net: dsa: move dsa_port :: stp_state near dsa_port :: mac
net: dsa: merge all bools of struct dsa_port into a single u8
net: dsa: move dsa_port :: type near dsa_port :: index
net: dsa: merge all bools of struct dsa_switch into a single u32
net: dsa: make dsa_switch :: num_ports an unsigned int
net: dsa: move dsa_switch_tree :: ports and lags to first cache line
net: dsa: combine two holes in struct dsa_switch_tree
include/net/dsa.h | 146 +++++++++++++++++++++++++---------------------
net/dsa/dsa2.c | 2 +-
2 files changed, 81 insertions(+), 67 deletions(-)
--
2.25.1
^ permalink raw reply
* [PATCH v2 net-next 1/7] net: dsa: move dsa_port :: stp_state near dsa_port :: mac
From: Vladimir Oltean @ 2022-01-05 13:21 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Jakub Kicinski, Andrew Lunn, Vivien Didelot,
Florian Fainelli
In-Reply-To: <20220105132141.2648876-1-vladimir.oltean@nxp.com>
The MAC address of a port is 6 octets in size, and this creates a 2
octet hole after it. There are some other u8 members of struct dsa_port
that we can put in that hole. One such member is the stp_state.
Before:
pahole -C dsa_port net/dsa/slave.o
struct dsa_port {
union {
struct net_device * master; /* 0 8 */
struct net_device * slave; /* 0 8 */
}; /* 0 8 */
const struct dsa_device_ops * tag_ops; /* 8 8 */
struct dsa_switch_tree * dst; /* 16 8 */
struct sk_buff * (*rcv)(struct sk_buff *, struct net_device *); /* 24 8 */
enum {
DSA_PORT_TYPE_UNUSED = 0,
DSA_PORT_TYPE_CPU = 1,
DSA_PORT_TYPE_DSA = 2,
DSA_PORT_TYPE_USER = 3,
} type; /* 32 4 */
/* XXX 4 bytes hole, try to pack */
struct dsa_switch * ds; /* 40 8 */
unsigned int index; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
const char * name; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct dsa_port * cpu_dp; /* 64 8 */
u8 mac[6]; /* 72 6 */
/* XXX 2 bytes hole, try to pack */
struct device_node * dn; /* 80 8 */
unsigned int ageing_time; /* 88 4 */
bool vlan_filtering; /* 92 1 */
bool learning; /* 93 1 */
u8 stp_state; /* 94 1 */
/* XXX 1 byte hole, try to pack */
struct dsa_bridge * bridge; /* 96 8 */
struct devlink_port devlink_port; /* 104 288 */
/* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
bool devlink_port_setup; /* 392 1 */
/* XXX 7 bytes hole, try to pack */
struct phylink * pl; /* 400 8 */
struct phylink_config pl_config; /* 408 40 */
/* --- cacheline 7 boundary (448 bytes) --- */
struct net_device * lag_dev; /* 448 8 */
bool lag_tx_enabled; /* 456 1 */
/* XXX 7 bytes hole, try to pack */
struct net_device * hsr_dev; /* 464 8 */
struct list_head list; /* 472 16 */
const struct ethtool_ops * orig_ethtool_ops; /* 488 8 */
const struct dsa_netdevice_ops * netdev_ops; /* 496 8 */
struct mutex addr_lists_lock; /* 504 32 */
/* --- cacheline 8 boundary (512 bytes) was 24 bytes ago --- */
struct list_head fdbs; /* 536 16 */
struct list_head mdbs; /* 552 16 */
bool setup; /* 568 1 */
/* size: 576, cachelines: 9, members: 30 */
/* sum members: 544, holes: 6, sum holes: 25 */
/* padding: 7 */
};
After:
pahole -C dsa_port net/dsa/slave.o
struct dsa_port {
union {
struct net_device * master; /* 0 8 */
struct net_device * slave; /* 0 8 */
}; /* 0 8 */
const struct dsa_device_ops * tag_ops; /* 8 8 */
struct dsa_switch_tree * dst; /* 16 8 */
struct sk_buff * (*rcv)(struct sk_buff *, struct net_device *); /* 24 8 */
enum {
DSA_PORT_TYPE_UNUSED = 0,
DSA_PORT_TYPE_CPU = 1,
DSA_PORT_TYPE_DSA = 2,
DSA_PORT_TYPE_USER = 3,
} type; /* 32 4 */
/* XXX 4 bytes hole, try to pack */
struct dsa_switch * ds; /* 40 8 */
unsigned int index; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
const char * name; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct dsa_port * cpu_dp; /* 64 8 */
u8 mac[6]; /* 72 6 */
u8 stp_state; /* 78 1 */
/* XXX 1 byte hole, try to pack */
struct device_node * dn; /* 80 8 */
unsigned int ageing_time; /* 88 4 */
bool vlan_filtering; /* 92 1 */
bool learning; /* 93 1 */
/* XXX 2 bytes hole, try to pack */
struct dsa_bridge * bridge; /* 96 8 */
struct devlink_port devlink_port; /* 104 288 */
/* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
bool devlink_port_setup; /* 392 1 */
/* XXX 7 bytes hole, try to pack */
struct phylink * pl; /* 400 8 */
struct phylink_config pl_config; /* 408 40 */
/* --- cacheline 7 boundary (448 bytes) --- */
struct net_device * lag_dev; /* 448 8 */
bool lag_tx_enabled; /* 456 1 */
/* XXX 7 bytes hole, try to pack */
struct net_device * hsr_dev; /* 464 8 */
struct list_head list; /* 472 16 */
const struct ethtool_ops * orig_ethtool_ops; /* 488 8 */
const struct dsa_netdevice_ops * netdev_ops; /* 496 8 */
struct mutex addr_lists_lock; /* 504 32 */
/* --- cacheline 8 boundary (512 bytes) was 24 bytes ago --- */
struct list_head fdbs; /* 536 16 */
struct list_head mdbs; /* 552 16 */
bool setup; /* 568 1 */
/* size: 576, cachelines: 9, members: 30 */
/* sum members: 544, holes: 6, sum holes: 25 */
/* padding: 7 */
};
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
include/net/dsa.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index f16959444ae1..8878f9ce251b 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -258,12 +258,14 @@ struct dsa_port {
const char *name;
struct dsa_port *cpu_dp;
u8 mac[ETH_ALEN];
+
+ u8 stp_state;
+
struct device_node *dn;
unsigned int ageing_time;
bool vlan_filtering;
/* Managed by DSA on user ports and by drivers on CPU and DSA ports */
bool learning;
- u8 stp_state;
struct dsa_bridge *bridge;
struct devlink_port devlink_port;
bool devlink_port_setup;
--
2.25.1
^ permalink raw reply related
* [PATCH v2 net-next 2/7] net: dsa: merge all bools of struct dsa_port into a single u8
From: Vladimir Oltean @ 2022-01-05 13:21 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Jakub Kicinski, Andrew Lunn, Vivien Didelot,
Florian Fainelli
In-Reply-To: <20220105132141.2648876-1-vladimir.oltean@nxp.com>
struct dsa_port has 5 bool members which create quite a number of 7 byte
holes in the structure layout. By merging them all into bitfields of an
u8, and placing that u8 in the 1-byte hole after dp->mac and dp->stp_state,
we can reduce the structure size from 576 bytes to 552 bytes on arm64.
Before:
pahole -C dsa_port net/dsa/slave.o
struct dsa_port {
union {
struct net_device * master; /* 0 8 */
struct net_device * slave; /* 0 8 */
}; /* 0 8 */
const struct dsa_device_ops * tag_ops; /* 8 8 */
struct dsa_switch_tree * dst; /* 16 8 */
struct sk_buff * (*rcv)(struct sk_buff *, struct net_device *); /* 24 8 */
enum {
DSA_PORT_TYPE_UNUSED = 0,
DSA_PORT_TYPE_CPU = 1,
DSA_PORT_TYPE_DSA = 2,
DSA_PORT_TYPE_USER = 3,
} type; /* 32 4 */
/* XXX 4 bytes hole, try to pack */
struct dsa_switch * ds; /* 40 8 */
unsigned int index; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
const char * name; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct dsa_port * cpu_dp; /* 64 8 */
u8 mac[6]; /* 72 6 */
u8 stp_state; /* 78 1 */
/* XXX 1 byte hole, try to pack */
struct device_node * dn; /* 80 8 */
unsigned int ageing_time; /* 88 4 */
bool vlan_filtering; /* 92 1 */
bool learning; /* 93 1 */
/* XXX 2 bytes hole, try to pack */
struct dsa_bridge * bridge; /* 96 8 */
struct devlink_port devlink_port; /* 104 288 */
/* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
bool devlink_port_setup; /* 392 1 */
/* XXX 7 bytes hole, try to pack */
struct phylink * pl; /* 400 8 */
struct phylink_config pl_config; /* 408 40 */
/* --- cacheline 7 boundary (448 bytes) --- */
struct net_device * lag_dev; /* 448 8 */
bool lag_tx_enabled; /* 456 1 */
/* XXX 7 bytes hole, try to pack */
struct net_device * hsr_dev; /* 464 8 */
struct list_head list; /* 472 16 */
const struct ethtool_ops * orig_ethtool_ops; /* 488 8 */
const struct dsa_netdevice_ops * netdev_ops; /* 496 8 */
struct mutex addr_lists_lock; /* 504 32 */
/* --- cacheline 8 boundary (512 bytes) was 24 bytes ago --- */
struct list_head fdbs; /* 536 16 */
struct list_head mdbs; /* 552 16 */
bool setup; /* 568 1 */
/* size: 576, cachelines: 9, members: 30 */
/* sum members: 544, holes: 6, sum holes: 25 */
/* padding: 7 */
};
After:
pahole -C dsa_port net/dsa/slave.o
struct dsa_port {
union {
struct net_device * master; /* 0 8 */
struct net_device * slave; /* 0 8 */
}; /* 0 8 */
const struct dsa_device_ops * tag_ops; /* 8 8 */
struct dsa_switch_tree * dst; /* 16 8 */
struct sk_buff * (*rcv)(struct sk_buff *, struct net_device *); /* 24 8 */
enum {
DSA_PORT_TYPE_UNUSED = 0,
DSA_PORT_TYPE_CPU = 1,
DSA_PORT_TYPE_DSA = 2,
DSA_PORT_TYPE_USER = 3,
} type; /* 32 4 */
/* XXX 4 bytes hole, try to pack */
struct dsa_switch * ds; /* 40 8 */
unsigned int index; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
const char * name; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct dsa_port * cpu_dp; /* 64 8 */
u8 mac[6]; /* 72 6 */
u8 stp_state; /* 78 1 */
u8 vlan_filtering:1; /* 79: 0 1 */
u8 learning:1; /* 79: 1 1 */
u8 lag_tx_enabled:1; /* 79: 2 1 */
u8 devlink_port_setup:1; /* 79: 3 1 */
u8 setup:1; /* 79: 4 1 */
/* XXX 3 bits hole, try to pack */
struct device_node * dn; /* 80 8 */
unsigned int ageing_time; /* 88 4 */
/* XXX 4 bytes hole, try to pack */
struct dsa_bridge * bridge; /* 96 8 */
struct devlink_port devlink_port; /* 104 288 */
/* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
struct phylink * pl; /* 392 8 */
struct phylink_config pl_config; /* 400 40 */
struct net_device * lag_dev; /* 440 8 */
/* --- cacheline 7 boundary (448 bytes) --- */
struct net_device * hsr_dev; /* 448 8 */
struct list_head list; /* 456 16 */
const struct ethtool_ops * orig_ethtool_ops; /* 472 8 */
const struct dsa_netdevice_ops * netdev_ops; /* 480 8 */
struct mutex addr_lists_lock; /* 488 32 */
/* --- cacheline 8 boundary (512 bytes) was 8 bytes ago --- */
struct list_head fdbs; /* 520 16 */
struct list_head mdbs; /* 536 16 */
/* size: 552, cachelines: 9, members: 30 */
/* sum members: 539, holes: 3, sum holes: 12 */
/* sum bitfield members: 5 bits, bit holes: 1, sum bit holes: 3 bits */
/* last cacheline: 40 bytes */
};
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
include/net/dsa.h | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 8878f9ce251b..a8f0037b58e2 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -261,18 +261,23 @@ struct dsa_port {
u8 stp_state;
+ u8 vlan_filtering:1,
+ /* Managed by DSA on user ports and by
+ * drivers on CPU and DSA ports
+ */
+ learning:1,
+ lag_tx_enabled:1,
+ devlink_port_setup:1,
+ setup:1;
+
struct device_node *dn;
unsigned int ageing_time;
- bool vlan_filtering;
- /* Managed by DSA on user ports and by drivers on CPU and DSA ports */
- bool learning;
+
struct dsa_bridge *bridge;
struct devlink_port devlink_port;
- bool devlink_port_setup;
struct phylink *pl;
struct phylink_config pl_config;
struct net_device *lag_dev;
- bool lag_tx_enabled;
struct net_device *hsr_dev;
struct list_head list;
@@ -293,8 +298,6 @@ struct dsa_port {
struct mutex addr_lists_lock;
struct list_head fdbs;
struct list_head mdbs;
-
- bool setup;
};
/* TODO: ideally DSA ports would have a single dp->link_dp member,
--
2.25.1
^ permalink raw reply related
* [PATCH v2 net-next 3/7] net: dsa: move dsa_port :: type near dsa_port :: index
From: Vladimir Oltean @ 2022-01-05 13:21 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Jakub Kicinski, Andrew Lunn, Vivien Didelot,
Florian Fainelli
In-Reply-To: <20220105132141.2648876-1-vladimir.oltean@nxp.com>
Both dsa_port :: type and dsa_port :: index introduce a 4 octet hole
after them, so we can group them together and the holes would be
eliminated, turning 16 octets of storage into just 8. This makes the
cpu_dp pointer fit in the first cache line, which is good, because
dsa_slave_to_master(), called by dsa_enqueue_skb(), uses it.
Before:
pahole -C dsa_port net/dsa/slave.o
struct dsa_port {
union {
struct net_device * master; /* 0 8 */
struct net_device * slave; /* 0 8 */
}; /* 0 8 */
const struct dsa_device_ops * tag_ops; /* 8 8 */
struct dsa_switch_tree * dst; /* 16 8 */
struct sk_buff * (*rcv)(struct sk_buff *, struct net_device *); /* 24 8 */
enum {
DSA_PORT_TYPE_UNUSED = 0,
DSA_PORT_TYPE_CPU = 1,
DSA_PORT_TYPE_DSA = 2,
DSA_PORT_TYPE_USER = 3,
} type; /* 32 4 */
/* XXX 4 bytes hole, try to pack */
struct dsa_switch * ds; /* 40 8 */
unsigned int index; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
const char * name; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct dsa_port * cpu_dp; /* 64 8 */
u8 mac[6]; /* 72 6 */
u8 stp_state; /* 78 1 */
u8 vlan_filtering:1; /* 79: 0 1 */
u8 learning:1; /* 79: 1 1 */
u8 lag_tx_enabled:1; /* 79: 2 1 */
u8 devlink_port_setup:1; /* 79: 3 1 */
u8 setup:1; /* 79: 4 1 */
/* XXX 3 bits hole, try to pack */
struct device_node * dn; /* 80 8 */
unsigned int ageing_time; /* 88 4 */
/* XXX 4 bytes hole, try to pack */
struct dsa_bridge * bridge; /* 96 8 */
struct devlink_port devlink_port; /* 104 288 */
/* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
struct phylink * pl; /* 392 8 */
struct phylink_config pl_config; /* 400 40 */
struct net_device * lag_dev; /* 440 8 */
/* --- cacheline 7 boundary (448 bytes) --- */
struct net_device * hsr_dev; /* 448 8 */
struct list_head list; /* 456 16 */
const struct ethtool_ops * orig_ethtool_ops; /* 472 8 */
const struct dsa_netdevice_ops * netdev_ops; /* 480 8 */
struct mutex addr_lists_lock; /* 488 32 */
/* --- cacheline 8 boundary (512 bytes) was 8 bytes ago --- */
struct list_head fdbs; /* 520 16 */
struct list_head mdbs; /* 536 16 */
/* size: 552, cachelines: 9, members: 30 */
/* sum members: 539, holes: 3, sum holes: 12 */
/* sum bitfield members: 5 bits, bit holes: 1, sum bit holes: 3 bits */
/* last cacheline: 40 bytes */
};
After:
pahole -C dsa_port net/dsa/slave.o
struct dsa_port {
union {
struct net_device * master; /* 0 8 */
struct net_device * slave; /* 0 8 */
}; /* 0 8 */
const struct dsa_device_ops * tag_ops; /* 8 8 */
struct dsa_switch_tree * dst; /* 16 8 */
struct sk_buff * (*rcv)(struct sk_buff *, struct net_device *); /* 24 8 */
struct dsa_switch * ds; /* 32 8 */
unsigned int index; /* 40 4 */
enum {
DSA_PORT_TYPE_UNUSED = 0,
DSA_PORT_TYPE_CPU = 1,
DSA_PORT_TYPE_DSA = 2,
DSA_PORT_TYPE_USER = 3,
} type; /* 44 4 */
const char * name; /* 48 8 */
struct dsa_port * cpu_dp; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u8 mac[6]; /* 64 6 */
u8 stp_state; /* 70 1 */
u8 vlan_filtering:1; /* 71: 0 1 */
u8 learning:1; /* 71: 1 1 */
u8 lag_tx_enabled:1; /* 71: 2 1 */
u8 devlink_port_setup:1; /* 71: 3 1 */
u8 setup:1; /* 71: 4 1 */
/* XXX 3 bits hole, try to pack */
struct device_node * dn; /* 72 8 */
unsigned int ageing_time; /* 80 4 */
/* XXX 4 bytes hole, try to pack */
struct dsa_bridge * bridge; /* 88 8 */
struct devlink_port devlink_port; /* 96 288 */
/* --- cacheline 6 boundary (384 bytes) --- */
struct phylink * pl; /* 384 8 */
struct phylink_config pl_config; /* 392 40 */
struct net_device * lag_dev; /* 432 8 */
struct net_device * hsr_dev; /* 440 8 */
/* --- cacheline 7 boundary (448 bytes) --- */
struct list_head list; /* 448 16 */
const struct ethtool_ops * orig_ethtool_ops; /* 464 8 */
const struct dsa_netdevice_ops * netdev_ops; /* 472 8 */
struct mutex addr_lists_lock; /* 480 32 */
/* --- cacheline 8 boundary (512 bytes) --- */
struct list_head fdbs; /* 512 16 */
struct list_head mdbs; /* 528 16 */
/* size: 544, cachelines: 9, members: 30 */
/* sum members: 539, holes: 1, sum holes: 4 */
/* sum bitfield members: 5 bits, bit holes: 1, sum bit holes: 3 bits */
/* last cacheline: 32 bytes */
};
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
include/net/dsa.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index a8f0037b58e2..5e42fa7ea377 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -246,6 +246,10 @@ struct dsa_port {
struct dsa_switch_tree *dst;
struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev);
+ struct dsa_switch *ds;
+
+ unsigned int index;
+
enum {
DSA_PORT_TYPE_UNUSED = 0,
DSA_PORT_TYPE_CPU,
@@ -253,8 +257,6 @@ struct dsa_port {
DSA_PORT_TYPE_USER,
} type;
- struct dsa_switch *ds;
- unsigned int index;
const char *name;
struct dsa_port *cpu_dp;
u8 mac[ETH_ALEN];
--
2.25.1
^ permalink raw reply related
* [PATCH v2 net-next 4/7] net: dsa: merge all bools of struct dsa_switch into a single u32
From: Vladimir Oltean @ 2022-01-05 13:21 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Jakub Kicinski, Andrew Lunn, Vivien Didelot,
Florian Fainelli
In-Reply-To: <20220105132141.2648876-1-vladimir.oltean@nxp.com>
struct dsa_switch has 9 boolean properties, many of which are in fact
set by drivers for custom behavior (vlan_filtering_is_global,
needs_standalone_vlan_filtering, etc etc). The binary layout of the
structure could be improved. For example, the "bool setup" at the
beginning introduces a gratuitous 7 byte hole in the first cache line.
The change merges all boolean properties into bitfields of an u32, and
places that u32 in the first cache line of the structure, since many
bools are accessed from the data path (untag_bridge_pvid, vlan_filtering,
vlan_filtering_is_global).
We place this u32 after the existing ds->index, which is also 4 bytes in
size. As a positive side effect, ds->tagger_data now fits into the first
cache line too, because 4 bytes are saved.
Before:
pahole -C dsa_switch net/dsa/slave.o
struct dsa_switch {
bool setup; /* 0 1 */
/* XXX 7 bytes hole, try to pack */
struct device * dev; /* 8 8 */
struct dsa_switch_tree * dst; /* 16 8 */
unsigned int index; /* 24 4 */
/* XXX 4 bytes hole, try to pack */
struct notifier_block nb; /* 32 24 */
/* XXX last struct has 4 bytes of padding */
void * priv; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
void * tagger_data; /* 64 8 */
struct dsa_chip_data * cd; /* 72 8 */
const struct dsa_switch_ops * ops; /* 80 8 */
u32 phys_mii_mask; /* 88 4 */
/* XXX 4 bytes hole, try to pack */
struct mii_bus * slave_mii_bus; /* 96 8 */
unsigned int ageing_time_min; /* 104 4 */
unsigned int ageing_time_max; /* 108 4 */
struct dsa_8021q_context * tag_8021q_ctx; /* 112 8 */
struct devlink * devlink; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
unsigned int num_tx_queues; /* 128 4 */
bool vlan_filtering_is_global; /* 132 1 */
bool needs_standalone_vlan_filtering; /* 133 1 */
bool configure_vlan_while_not_filtering; /* 134 1 */
bool untag_bridge_pvid; /* 135 1 */
bool assisted_learning_on_cpu_port; /* 136 1 */
bool vlan_filtering; /* 137 1 */
bool pcs_poll; /* 138 1 */
bool mtu_enforcement_ingress; /* 139 1 */
unsigned int num_lag_ids; /* 140 4 */
unsigned int max_num_bridges; /* 144 4 */
/* XXX 4 bytes hole, try to pack */
size_t num_ports; /* 152 8 */
/* size: 160, cachelines: 3, members: 27 */
/* sum members: 141, holes: 4, sum holes: 19 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 32 bytes */
};
After:
pahole -C dsa_switch net/dsa/slave.o
struct dsa_switch {
struct device * dev; /* 0 8 */
struct dsa_switch_tree * dst; /* 8 8 */
unsigned int index; /* 16 4 */
u32 setup:1; /* 20: 0 4 */
u32 vlan_filtering_is_global:1; /* 20: 1 4 */
u32 needs_standalone_vlan_filtering:1; /* 20: 2 4 */
u32 configure_vlan_while_not_filtering:1; /* 20: 3 4 */
u32 untag_bridge_pvid:1; /* 20: 4 4 */
u32 assisted_learning_on_cpu_port:1; /* 20: 5 4 */
u32 vlan_filtering:1; /* 20: 6 4 */
u32 pcs_poll:1; /* 20: 7 4 */
u32 mtu_enforcement_ingress:1; /* 20: 8 4 */
/* XXX 23 bits hole, try to pack */
struct notifier_block nb; /* 24 24 */
/* XXX last struct has 4 bytes of padding */
void * priv; /* 48 8 */
void * tagger_data; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct dsa_chip_data * cd; /* 64 8 */
const struct dsa_switch_ops * ops; /* 72 8 */
u32 phys_mii_mask; /* 80 4 */
/* XXX 4 bytes hole, try to pack */
struct mii_bus * slave_mii_bus; /* 88 8 */
unsigned int ageing_time_min; /* 96 4 */
unsigned int ageing_time_max; /* 100 4 */
struct dsa_8021q_context * tag_8021q_ctx; /* 104 8 */
struct devlink * devlink; /* 112 8 */
unsigned int num_tx_queues; /* 120 4 */
unsigned int num_lag_ids; /* 124 4 */
/* --- cacheline 2 boundary (128 bytes) --- */
unsigned int max_num_bridges; /* 128 4 */
/* XXX 4 bytes hole, try to pack */
size_t num_ports; /* 136 8 */
/* size: 144, cachelines: 3, members: 27 */
/* sum members: 132, holes: 2, sum holes: 8 */
/* sum bitfield members: 9 bits, bit holes: 1, sum bit holes: 23 bits */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 16 bytes */
};
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
include/net/dsa.h | 97 +++++++++++++++++++++++++----------------------
1 file changed, 51 insertions(+), 46 deletions(-)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 5e42fa7ea377..a8a586039033 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -321,8 +321,6 @@ struct dsa_mac_addr {
};
struct dsa_switch {
- bool setup;
-
struct device *dev;
/*
@@ -331,6 +329,57 @@ struct dsa_switch {
struct dsa_switch_tree *dst;
unsigned int index;
+ u32 setup:1,
+ /* Disallow bridge core from requesting
+ * different VLAN awareness settings on ports
+ * if not hardware-supported
+ */
+ vlan_filtering_is_global:1,
+ /* Keep VLAN filtering enabled on ports not
+ * offloading any upper
+ */
+ needs_standalone_vlan_filtering:1,
+ /* Pass .port_vlan_add and .port_vlan_del to
+ * drivers even for bridges that have
+ * vlan_filtering=0. All drivers should ideally
+ * set this (and then the option would get
+ * removed), but it is unknown whether this
+ * would break things or not.
+ */
+ configure_vlan_while_not_filtering:1,
+ /* If the switch driver always programs the CPU
+ * port as egress tagged despite the VLAN
+ * configuration indicating otherwise, then
+ * setting @untag_bridge_pvid will force the
+ * DSA receive path to pop the bridge's
+ * default_pvid VLAN tagged frames to offer a
+ * consistent behavior between a
+ * vlan_filtering=0 and vlan_filtering=1 bridge
+ * device.
+ */
+ untag_bridge_pvid:1,
+ /* Let DSA manage the FDB entries towards the
+ * CPU, based on the software bridge database.
+ */
+ assisted_learning_on_cpu_port:1,
+ /* In case vlan_filtering_is_global is set, the
+ * VLAN awareness state should be retrieved
+ * from here and not from the per-port
+ * settings.
+ */
+ vlan_filtering:1,
+ /* MAC PCS does not provide link state change
+ * interrupt, and requires polling. Flag passed
+ * on to PHYLINK.
+ */
+ pcs_poll:1,
+ /* For switches that only have the MRU
+ * configurable. To ensure the configured MTU
+ * is not exceeded, normalization of MRU on all
+ * bridged interfaces is needed.
+ */
+ mtu_enforcement_ingress:1;
+
/* Listener for switch fabric events */
struct notifier_block nb;
@@ -371,50 +420,6 @@ struct dsa_switch {
/* Number of switch port queues */
unsigned int num_tx_queues;
- /* Disallow bridge core from requesting different VLAN awareness
- * settings on ports if not hardware-supported
- */
- bool vlan_filtering_is_global;
-
- /* Keep VLAN filtering enabled on ports not offloading any upper. */
- bool needs_standalone_vlan_filtering;
-
- /* Pass .port_vlan_add and .port_vlan_del to drivers even for bridges
- * that have vlan_filtering=0. All drivers should ideally set this (and
- * then the option would get removed), but it is unknown whether this
- * would break things or not.
- */
- bool configure_vlan_while_not_filtering;
-
- /* If the switch driver always programs the CPU port as egress tagged
- * despite the VLAN configuration indicating otherwise, then setting
- * @untag_bridge_pvid will force the DSA receive path to pop the bridge's
- * default_pvid VLAN tagged frames to offer a consistent behavior
- * between a vlan_filtering=0 and vlan_filtering=1 bridge device.
- */
- bool untag_bridge_pvid;
-
- /* Let DSA manage the FDB entries towards the CPU, based on the
- * software bridge database.
- */
- bool assisted_learning_on_cpu_port;
-
- /* In case vlan_filtering_is_global is set, the VLAN awareness state
- * should be retrieved from here and not from the per-port settings.
- */
- bool vlan_filtering;
-
- /* MAC PCS does not provide link state change interrupt, and requires
- * polling. Flag passed on to PHYLINK.
- */
- bool pcs_poll;
-
- /* For switches that only have the MRU configurable. To ensure the
- * configured MTU is not exceeded, normalization of MRU on all bridged
- * interfaces is needed.
- */
- bool mtu_enforcement_ingress;
-
/* Drivers that benefit from having an ID associated with each
* offloaded LAG should set this to the maximum number of
* supported IDs. DSA will then maintain a mapping of _at
--
2.25.1
^ permalink raw reply related
* [PATCH v2 net-next 5/7] net: dsa: make dsa_switch :: num_ports an unsigned int
From: Vladimir Oltean @ 2022-01-05 13:21 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Jakub Kicinski, Andrew Lunn, Vivien Didelot,
Florian Fainelli
In-Reply-To: <20220105132141.2648876-1-vladimir.oltean@nxp.com>
Currently, num_ports is declared as size_t, which is defined as
__kernel_ulong_t, therefore it occupies 8 bytes of memory.
Even switches with port numbers in the range of tens are exotic, so
there is no need for this amount of storage.
Additionally, because the max_num_bridges member right above it is also
4 bytes, it means the compiler needs to add padding between the last 2
fields. By reducing the size, we don't need that padding and can reduce
the struct size.
Before:
pahole -C dsa_switch net/dsa/slave.o
struct dsa_switch {
struct device * dev; /* 0 8 */
struct dsa_switch_tree * dst; /* 8 8 */
unsigned int index; /* 16 4 */
u32 setup:1; /* 20: 0 4 */
u32 vlan_filtering_is_global:1; /* 20: 1 4 */
u32 needs_standalone_vlan_filtering:1; /* 20: 2 4 */
u32 configure_vlan_while_not_filtering:1; /* 20: 3 4 */
u32 untag_bridge_pvid:1; /* 20: 4 4 */
u32 assisted_learning_on_cpu_port:1; /* 20: 5 4 */
u32 vlan_filtering:1; /* 20: 6 4 */
u32 pcs_poll:1; /* 20: 7 4 */
u32 mtu_enforcement_ingress:1; /* 20: 8 4 */
/* XXX 23 bits hole, try to pack */
struct notifier_block nb; /* 24 24 */
/* XXX last struct has 4 bytes of padding */
void * priv; /* 48 8 */
void * tagger_data; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct dsa_chip_data * cd; /* 64 8 */
const struct dsa_switch_ops * ops; /* 72 8 */
u32 phys_mii_mask; /* 80 4 */
/* XXX 4 bytes hole, try to pack */
struct mii_bus * slave_mii_bus; /* 88 8 */
unsigned int ageing_time_min; /* 96 4 */
unsigned int ageing_time_max; /* 100 4 */
struct dsa_8021q_context * tag_8021q_ctx; /* 104 8 */
struct devlink * devlink; /* 112 8 */
unsigned int num_tx_queues; /* 120 4 */
unsigned int num_lag_ids; /* 124 4 */
/* --- cacheline 2 boundary (128 bytes) --- */
unsigned int max_num_bridges; /* 128 4 */
/* XXX 4 bytes hole, try to pack */
size_t num_ports; /* 136 8 */
/* size: 144, cachelines: 3, members: 27 */
/* sum members: 132, holes: 2, sum holes: 8 */
/* sum bitfield members: 9 bits, bit holes: 1, sum bit holes: 23 bits */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 16 bytes */
};
After:
pahole -C dsa_switch net/dsa/slave.o
struct dsa_switch {
struct device * dev; /* 0 8 */
struct dsa_switch_tree * dst; /* 8 8 */
unsigned int index; /* 16 4 */
u32 setup:1; /* 20: 0 4 */
u32 vlan_filtering_is_global:1; /* 20: 1 4 */
u32 needs_standalone_vlan_filtering:1; /* 20: 2 4 */
u32 configure_vlan_while_not_filtering:1; /* 20: 3 4 */
u32 untag_bridge_pvid:1; /* 20: 4 4 */
u32 assisted_learning_on_cpu_port:1; /* 20: 5 4 */
u32 vlan_filtering:1; /* 20: 6 4 */
u32 pcs_poll:1; /* 20: 7 4 */
u32 mtu_enforcement_ingress:1; /* 20: 8 4 */
/* XXX 23 bits hole, try to pack */
struct notifier_block nb; /* 24 24 */
/* XXX last struct has 4 bytes of padding */
void * priv; /* 48 8 */
void * tagger_data; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct dsa_chip_data * cd; /* 64 8 */
const struct dsa_switch_ops * ops; /* 72 8 */
u32 phys_mii_mask; /* 80 4 */
/* XXX 4 bytes hole, try to pack */
struct mii_bus * slave_mii_bus; /* 88 8 */
unsigned int ageing_time_min; /* 96 4 */
unsigned int ageing_time_max; /* 100 4 */
struct dsa_8021q_context * tag_8021q_ctx; /* 104 8 */
struct devlink * devlink; /* 112 8 */
unsigned int num_tx_queues; /* 120 4 */
unsigned int num_lag_ids; /* 124 4 */
/* --- cacheline 2 boundary (128 bytes) --- */
unsigned int max_num_bridges; /* 128 4 */
unsigned int num_ports; /* 132 4 */
/* size: 136, cachelines: 3, members: 27 */
/* sum members: 128, holes: 1, sum holes: 4 */
/* sum bitfield members: 9 bits, bit holes: 1, sum bit holes: 23 bits */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 8 bytes */
};
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
include/net/dsa.h | 2 +-
net/dsa/dsa2.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index a8a586039033..fef9d8bb5190 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -435,7 +435,7 @@ struct dsa_switch {
*/
unsigned int max_num_bridges;
- size_t num_ports;
+ unsigned int num_ports;
};
static inline struct dsa_port *dsa_to_port(struct dsa_switch *ds, int p)
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index c1da813786a4..3d21521453fe 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -1475,7 +1475,7 @@ static int dsa_switch_parse_ports_of(struct dsa_switch *ds,
}
if (reg >= ds->num_ports) {
- dev_err(ds->dev, "port %pOF index %u exceeds num_ports (%zu)\n",
+ dev_err(ds->dev, "port %pOF index %u exceeds num_ports (%u)\n",
port, reg, ds->num_ports);
of_node_put(port);
err = -EINVAL;
--
2.25.1
^ permalink raw reply related
* [PATCH v2 net-next 6/7] net: dsa: move dsa_switch_tree :: ports and lags to first cache line
From: Vladimir Oltean @ 2022-01-05 13:21 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Jakub Kicinski, Andrew Lunn, Vivien Didelot,
Florian Fainelli
In-Reply-To: <20220105132141.2648876-1-vladimir.oltean@nxp.com>
dst->ports is accessed most notably by dsa_master_find_slave(), which is
invoked in the RX path.
dst->lags is accessed by dsa_lag_dev(), which is invoked in the RX path
of tag_dsa.c.
dst->tag_ops, dst->default_proto and dst->pd don't need to be in the
first cache line, so they are moved out by this change.
Before:
pahole -C dsa_switch_tree net/dsa/slave.o
struct dsa_switch_tree {
struct list_head list; /* 0 16 */
struct raw_notifier_head nh; /* 16 8 */
unsigned int index; /* 24 4 */
struct kref refcount; /* 28 4 */
bool setup; /* 32 1 */
/* XXX 7 bytes hole, try to pack */
const struct dsa_device_ops * tag_ops; /* 40 8 */
enum dsa_tag_protocol default_proto; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
struct dsa_platform_data * pd; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct list_head ports; /* 64 16 */
struct list_head rtable; /* 80 16 */
struct net_device * * lags; /* 96 8 */
unsigned int lags_len; /* 104 4 */
unsigned int last_switch; /* 108 4 */
/* size: 112, cachelines: 2, members: 13 */
/* sum members: 101, holes: 2, sum holes: 11 */
/* last cacheline: 48 bytes */
};
After:
pahole -C dsa_switch_tree net/dsa/slave.o
struct dsa_switch_tree {
struct list_head list; /* 0 16 */
struct list_head ports; /* 16 16 */
struct raw_notifier_head nh; /* 32 8 */
unsigned int index; /* 40 4 */
struct kref refcount; /* 44 4 */
struct net_device * * lags; /* 48 8 */
bool setup; /* 56 1 */
/* XXX 7 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
const struct dsa_device_ops * tag_ops; /* 64 8 */
enum dsa_tag_protocol default_proto; /* 72 4 */
/* XXX 4 bytes hole, try to pack */
struct dsa_platform_data * pd; /* 80 8 */
struct list_head rtable; /* 88 16 */
unsigned int lags_len; /* 104 4 */
unsigned int last_switch; /* 108 4 */
/* size: 112, cachelines: 2, members: 13 */
/* sum members: 101, holes: 2, sum holes: 11 */
/* last cacheline: 48 bytes */
};
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
include/net/dsa.h | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index fef9d8bb5190..cbbac75138d9 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -119,6 +119,9 @@ struct dsa_netdevice_ops {
struct dsa_switch_tree {
struct list_head list;
+ /* List of switch ports */
+ struct list_head ports;
+
/* Notifier chain for switch-wide events */
struct raw_notifier_head nh;
@@ -128,6 +131,11 @@ struct dsa_switch_tree {
/* Number of switches attached to this tree */
struct kref refcount;
+ /* Maps offloaded LAG netdevs to a zero-based linear ID for
+ * drivers that need it.
+ */
+ struct net_device **lags;
+
/* Has this tree been applied to the hardware? */
bool setup;
@@ -145,16 +153,10 @@ struct dsa_switch_tree {
*/
struct dsa_platform_data *pd;
- /* List of switch ports */
- struct list_head ports;
-
/* List of DSA links composing the routing table */
struct list_head rtable;
- /* Maps offloaded LAG netdevs to a zero-based linear ID for
- * drivers that need it.
- */
- struct net_device **lags;
+ /* Length of "lags" array */
unsigned int lags_len;
/* Track the largest switch index within a tree */
--
2.25.1
^ permalink raw reply related
* [PATCH v2 net-next 7/7] net: dsa: combine two holes in struct dsa_switch_tree
From: Vladimir Oltean @ 2022-01-05 13:21 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Jakub Kicinski, Andrew Lunn, Vivien Didelot,
Florian Fainelli
In-Reply-To: <20220105132141.2648876-1-vladimir.oltean@nxp.com>
There is a 7 byte hole after dst->setup and a 4 byte hole after
dst->default_proto. Combining them, we have a single hole of just 3
bytes on 64 bit machines.
Before:
pahole -C dsa_switch_tree net/dsa/slave.o
struct dsa_switch_tree {
struct list_head list; /* 0 16 */
struct list_head ports; /* 16 16 */
struct raw_notifier_head nh; /* 32 8 */
unsigned int index; /* 40 4 */
struct kref refcount; /* 44 4 */
struct net_device * * lags; /* 48 8 */
bool setup; /* 56 1 */
/* XXX 7 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
const struct dsa_device_ops * tag_ops; /* 64 8 */
enum dsa_tag_protocol default_proto; /* 72 4 */
/* XXX 4 bytes hole, try to pack */
struct dsa_platform_data * pd; /* 80 8 */
struct list_head rtable; /* 88 16 */
unsigned int lags_len; /* 104 4 */
unsigned int last_switch; /* 108 4 */
/* size: 112, cachelines: 2, members: 13 */
/* sum members: 101, holes: 2, sum holes: 11 */
/* last cacheline: 48 bytes */
};
After:
pahole -C dsa_switch_tree net/dsa/slave.o
struct dsa_switch_tree {
struct list_head list; /* 0 16 */
struct list_head ports; /* 16 16 */
struct raw_notifier_head nh; /* 32 8 */
unsigned int index; /* 40 4 */
struct kref refcount; /* 44 4 */
struct net_device * * lags; /* 48 8 */
const struct dsa_device_ops * tag_ops; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
enum dsa_tag_protocol default_proto; /* 64 4 */
bool setup; /* 68 1 */
/* XXX 3 bytes hole, try to pack */
struct dsa_platform_data * pd; /* 72 8 */
struct list_head rtable; /* 80 16 */
unsigned int lags_len; /* 96 4 */
unsigned int last_switch; /* 100 4 */
/* size: 104, cachelines: 2, members: 13 */
/* sum members: 101, holes: 1, sum holes: 3 */
/* last cacheline: 40 bytes */
};
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
include/net/dsa.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index cbbac75138d9..5d0fec6db3ae 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -136,9 +136,6 @@ struct dsa_switch_tree {
*/
struct net_device **lags;
- /* Has this tree been applied to the hardware? */
- bool setup;
-
/* Tagging protocol operations */
const struct dsa_device_ops *tag_ops;
@@ -147,6 +144,9 @@ struct dsa_switch_tree {
*/
enum dsa_tag_protocol default_proto;
+ /* Has this tree been applied to the hardware? */
+ bool setup;
+
/*
* Configuration data for the platform device that owns
* this dsa switch tree instance.
--
2.25.1
^ permalink raw reply related
* Re: [PATCH v2 04/35] brcmfmac: firmware: Support having multiple alt paths
From: Hector Martin @ 2022-01-05 13:22 UTC (permalink / raw)
To: Dmitry Osipenko, Kalle Valo, David S. Miller, Jakub Kicinski,
Rob Herring, Rafael J. Wysocki, Len Brown, Arend van Spriel,
Franky Lin, Hante Meuleman, Chi-hsien Lin, Wright Feng
Cc: Sven Peter, Alyssa Rosenzweig, Mark Kettenis,
Rafał Miłecki, Pieter-Paul Giesberts, Linus Walleij,
Hans de Goede, John W. Linville, brian m. carlson,
Andy Shevchenko, linux-wireless, netdev, devicetree, linux-kernel,
linux-acpi, brcm80211-dev-list.pdl, SHA-cyfmac-dev-list
In-Reply-To: <8394dbcd-f500-b1ae-fcd8-15485d8c0888@gmail.com>
On 05/01/2022 07.09, Dmitry Osipenko wrote:
> 04.01.2022 11:43, Hector Martin пишет:
>>>> +static int brcm_alt_fw_paths(const char *path, const char *board_type,
>>>> + const char *alt_paths[BRCMF_FW_MAX_ALT_PATHS])> {
>>>> char alt_path[BRCMF_FW_NAME_LEN];
>>>> const char *suffix;
>>>>
>>>> + memset(alt_paths, 0, array_size(sizeof(*alt_paths),
>>>> + BRCMF_FW_MAX_ALT_PATHS));
>>> You don't need to use array_size() since size of a fixed array is
>>> already known.
>>>
>>> memset(alt_paths, 0, sizeof(alt_paths));
>> It's a function argument, so that doesn't work and actually throws a
>> warning. Array function argument notation is informative only; they
>> behave strictly equivalent to pointers. Try it:
>>
>> $ cat test.c
>> #include <stdio.h>
>>
>> void foo(char x[42])
>> {
>> printf("%ld\n", sizeof(x));
>> }
>>
>> int main() {
>> char x[42];
>>
>> foo(x);
>> }
>> $ gcc test.c
>> test.c: In function ‘foo’:
>> test.c:5:31: warning: ‘sizeof’ on array function parameter ‘x’ will
>> return size of ‘char *’ [-Wsizeof-array-argument]
>> 5 | printf("%ld\n", sizeof(x));
>> | ^
>> test.c:3:15: note: declared here
>> 3 | void foo(char x[42])
>> | ~~~~~^~~~~
>> $ ./a.out
>> 8
>
> Then please use "const char **alt_paths" for the function argument to
> make code cleaner and add another argument to pass the number of array
> elements.
So you want me to do the ARRAY_SIZE at the caller side then?
>
> static int brcm_alt_fw_paths(const char *path, const char *board_type,
> const char **alt_paths, unsigned int num_paths)
> {
> size_t alt_paths_size = array_size(sizeof(*alt_paths), num_paths);
>
> memset(alt_paths, 0, alt_paths_size);
> }
>
> ...
>
> Maybe even better create a dedicated struct for the alt_paths:
>
> struct brcmf_fw_alt_paths {
> const char *alt_paths[BRCMF_FW_MAX_ALT_PATHS];
> unsigned int index;
> };
>
> and then use the ".index" in the brcm_free_alt_fw_paths(). I suppose
> this will make code a bit nicer and easier to follow.
>
I'm confused; the array size is constant. What would index contain and
why would would brcm_free_alt_fw_paths use it? Just as an iterator
variable instead of using a local variable? Or do you mean count?
Though, to be honest, at this point I'm considering rethinking the whole
patch for this mechanism because I'm not terribly happy with the current
approach and clearly you aren't either :-) Maybe it makes more sense to
stop trying to compute all the alt_paths ahead of time, and just have
the function compute a single one to be used just-in-time at firmware
request time, and just iterate over board_types.
--
Hector Martin (marcan@marcan.st)
Public Key: https://mrcn.st/pub
^ permalink raw reply
* Re: [PATCH v11 2/4] fs: split off do_getxattr from getxattr
From: Christian Brauner @ 2022-01-05 13:22 UTC (permalink / raw)
To: Stefan Roesch; +Cc: io-uring, linux-fsdevel, kernel-team, torvalds
In-Reply-To: <20220104190936.3085647-3-shr@fb.com>
On Tue, Jan 04, 2022 at 11:09:34AM -0800, Stefan Roesch wrote:
> This splits off do_getxattr function from the getxattr
> function. This will allow io_uring to call it from its
> io worker.
>
> Signed-off-by: Stefan Roesch <shr@fb.com>
> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
> ---
> fs/internal.h | 7 +++++++
> fs/xattr.c | 32 ++++++++++++++++++++------------
> 2 files changed, 27 insertions(+), 12 deletions(-)
>
> diff --git a/fs/internal.h b/fs/internal.h
> index 00c98b0cd634..942b2005a2be 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -220,6 +220,13 @@ struct xattr_ctx {
> unsigned int flags;
> };
>
> +
> +ssize_t do_getxattr(struct user_namespace *mnt_userns,
> + struct dentry *d,
> + const char *kname,
> + void __user *value,
> + size_t size);
> +
> int setxattr_copy(const char __user *name, struct xattr_ctx *ctx);
> int do_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
> struct xattr_ctx *ctx);
> diff --git a/fs/xattr.c b/fs/xattr.c
> index dec7ac3e0e89..7f2b805ed56c 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -675,19 +675,12 @@ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
> /*
> * Extended attribute GET operations
> */
> -static ssize_t
> -getxattr(struct user_namespace *mnt_userns, struct dentry *d,
> - const char __user *name, void __user *value, size_t size)
> +ssize_t
> +do_getxattr(struct user_namespace *mnt_userns, struct dentry *d,
> + const char *kname, void __user *value, size_t size)
> {
> - ssize_t error;
> void *kvalue = NULL;
> - char kname[XATTR_NAME_MAX + 1];
> -
> - error = strncpy_from_user(kname, name, sizeof(kname));
> - if (error == 0 || error == sizeof(kname))
> - error = -ERANGE;
> - if (error < 0)
> - return error;
> + ssize_t error;
>
> if (size) {
> if (size > XATTR_SIZE_MAX)
> @@ -711,10 +704,25 @@ getxattr(struct user_namespace *mnt_userns, struct dentry *d,
> }
>
> kvfree(kvalue);
> -
> return error;
> }
>
> +static ssize_t
> +getxattr(struct user_namespace *mnt_userns, struct dentry *d,
> + const char __user *name, void __user *value, size_t size)
> +{
> + ssize_t error;
> + struct xattr_name kname;
> +
> + error = strncpy_from_user(kname.name, name, sizeof(kname.name));
> + if (error == 0 || error == sizeof(kname.name))
> + error = -ERANGE;
> + if (error < 0)
> + return error;
> +
> + return do_getxattr(mnt_userns, d, kname.name, value, size);
> +}
Fwiw, this could have the same signature as do_setxattr(). So sm along
the lines of (completely untested):
Subject: [PATCH] UNTESTED
---
fs/internal.h | 8 ++------
fs/xattr.c | 36 ++++++++++++++++++++++--------------
2 files changed, 24 insertions(+), 20 deletions(-)
diff --git a/fs/internal.h b/fs/internal.h
index 942b2005a2be..d2332496724b 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -220,12 +220,8 @@ struct xattr_ctx {
unsigned int flags;
};
-
-ssize_t do_getxattr(struct user_namespace *mnt_userns,
- struct dentry *d,
- const char *kname,
- void __user *value,
- size_t size);
+ssize_t do_getxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
+ struct xattr_ctx *ctx);
int setxattr_copy(const char __user *name, struct xattr_ctx *ctx);
int do_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
diff --git a/fs/xattr.c b/fs/xattr.c
index 7f2b805ed56c..52bcfe149a9f 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -675,35 +675,34 @@ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
/*
* Extended attribute GET operations
*/
-ssize_t
-do_getxattr(struct user_namespace *mnt_userns, struct dentry *d,
- const char *kname, void __user *value, size_t size)
+ssize_t do_getxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
+ struct xattr_ctx *ctx)
+
{
- void *kvalue = NULL;
ssize_t error;
+ char *kname = ctx->kname.name;
- if (size) {
- if (size > XATTR_SIZE_MAX)
- size = XATTR_SIZE_MAX;
- kvalue = kvzalloc(size, GFP_KERNEL);
- if (!kvalue)
+ if (ctx->size) {
+ if (ctx->size > XATTR_SIZE_MAX)
+ ctx->size = XATTR_SIZE_MAX;
+ ctx->kvalue = kvzalloc(ctx->size, GFP_KERNEL);
+ if (!ctx->kvalue)
return -ENOMEM;
}
- error = vfs_getxattr(mnt_userns, d, kname, kvalue, size);
+ error = vfs_getxattr(mnt_userns, d, kname, ctx->kvalue, ctx->size);
if (error > 0) {
if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
(strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
posix_acl_fix_xattr_to_user(mnt_userns, kvalue, error);
- if (size && copy_to_user(value, kvalue, error))
+ if (ctx->size && copy_to_user(value, kvalue, error))
error = -EFAULT;
- } else if (error == -ERANGE && size >= XATTR_SIZE_MAX) {
+ } else if (error == -ERANGE && ctx->size >= XATTR_SIZE_MAX) {
/* The file system tried to returned a value bigger
than XATTR_SIZE_MAX bytes. Not possible. */
error = -E2BIG;
}
- kvfree(kvalue);
return error;
}
@@ -713,6 +712,12 @@ getxattr(struct user_namespace *mnt_userns, struct dentry *d,
{
ssize_t error;
struct xattr_name kname;
+ struct xattr_ctx ctx = {
+ .value = value,
+ .kvalue = NULL,
+ .size = size,
+ .kname = &kname,
+ };
error = strncpy_from_user(kname.name, name, sizeof(kname.name));
if (error == 0 || error == sizeof(kname.name))
@@ -720,7 +725,10 @@ getxattr(struct user_namespace *mnt_userns, struct dentry *d,
if (error < 0)
return error;
- return do_getxattr(mnt_userns, d, kname.name, value, size);
+ error = do_getxattr(mnt_userns, d, &ctx);
+
+ kvfree(ctx.kvalue);
+ return error;
}
static ssize_t path_getxattr(const char __user *pathname,
--
2.32.0
^ permalink raw reply related
* [PATCH] apply: Avoid ambiguous pointer provenance for CHERI/Arm's Morello
From: Jessica Clarke @ 2022-01-05 13:23 UTC (permalink / raw)
To: git; +Cc: Jessica Clarke
On CHERI, and thus Arm's Morello prototype, pointers are implemented as
hardware capabilities which, as well as having a normal integer address,
have additional bounds, permissions and other metadata in a second word.
In order to preserve this metadata, uintptr_t is also implemented as a
capability, not a plain integer, which causes problems for binary
operators, as the metadata preserved in the output can only come from
one of the inputs. In most cases this is clear, as normally at least one
operand is provably a plain integer, but if both operands are uintptr_t
and have no indication they're just plain integers then it is ambiguous,
and the current implementation will arbitrarily, but deterministically,
pick the left-hand side, due to empirical evidence that it is more
likely to be correct.
In this instance, both operands are of type uintptr_t, with one being a
function argument and one being cast from a pointer, so both could be
valid pointers. Moreover, the left-hand side is not the actual pointer.
This means that, currently, the code when run on a CHERI architecture
will preserve the metadata from the integer, i.e. an invalid capability
that will trap on deference, and not the pointer.
This can be addressed by changing the type of the function argument in
order to more clearly convey intent, both to the compiler so it knows to
generate the right code but also to the developer so it's clear that the
argument is not in fact a pointer but just a plain integer (in this case
being either APPLY_SYMLINK_GOES_AWAY or APPLY_SYMLINK_IN_RESULT).
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
---
apply.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/apply.c b/apply.c
index fed195250b..7c7d56cacb 100644
--- a/apply.c
+++ b/apply.c
@@ -3814,7 +3814,7 @@ static int check_to_create(struct apply_state *state,
static uintptr_t register_symlink_changes(struct apply_state *state,
const char *path,
- uintptr_t what)
+ size_t what)
{
struct string_list_item *ent;
@@ -3823,7 +3823,7 @@ static uintptr_t register_symlink_changes(struct apply_state *state,
ent = string_list_insert(&state->symlink_changes, path);
ent->util = (void *)0;
}
- ent->util = (void *)(what | ((uintptr_t)ent->util));
+ ent->util = (void *)((uintptr_t)what | ((uintptr_t)ent->util));
return (uintptr_t)ent->util;
}
--
2.33.1
^ permalink raw reply related
* [PATCH] Properly align memory allocations and temporary buffers
From: Jessica Clarke @ 2022-01-05 13:23 UTC (permalink / raw)
To: git; +Cc: Jessica Clarke
Currently git_qsort_s allocates a buffer on the stack that has no
alignment, and mem_pool_alloc assumes uintmax_t's size is adequate
alignment for any type.
On CHERI, and thus Arm's Morello prototype, pointers are implemented as
hardware capabilities which, as well as having a normal integer address,
have additional bounds, permissions and other metadata in a second word,
so on a 64-bit architecture they are 128-bit quantities, including their
alignment requirements. Despite being 128-bit, their integer component
is still only a 64-bit field, so uintmax_t remains 64-bit, and therefore
uintmax_t does not sufficiently align an allocation.
Moreover, these capabilities have an additional "129th" tag bit, which
tracks the validity of the capability and is cleared on any invalid
operation that doesn't trap (e.g. partially overwriting a capability
will invalidate it) which, combined with the architecture's strict
checks on capability manipulation instructions, ensures it is
architecturally impossible to construct a capability that gives more
rights than those you were given in the first place. To store these tag
bits, each capability sized and aligned word in memory gains a single
tag bit that is stored in unaddressable (to the processor) memory. This
means that it is impossible to store a capability at an unaligned
address: a normal load or store of a capability will always take an
alignment fault even if the (micro)architecture supports unaligned
loads/stores for other data types, and a memcpy will, if the destination
is not appropriately aligned, copy the byte representation but lose the
tag, meaning that if it is eventually copied back and loaded from an
aligned location any attempt to dereference it will trap with a tag
fault. Thus, even char buffers that are memcpy'ed to or from must be
properly aligned on CHERI architectures if they are to hold pointers.
Address both of these by introducing a new git_max_align type put in a
union with the on-stack buffer to force its alignment, as well as a new
GIT_MAX_ALIGNMENT macro whose value is the alignment of git_max_align
that gets used for mem_pool_alloc. As well as making the code work on
CHERI, the former change likely also improves performance on some
architectures by making memcpy faster (either because it can use larger
block sizes or because the microarchitecture has inefficient unaligned
accesses).
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
---
compat/qsort_s.c | 11 +++++++----
git-compat-util.h | 11 +++++++++++
mem-pool.c | 6 +++---
3 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/compat/qsort_s.c b/compat/qsort_s.c
index 52d1f0a73d..1ccdb87451 100644
--- a/compat/qsort_s.c
+++ b/compat/qsort_s.c
@@ -49,16 +49,19 @@ int git_qsort_s(void *b, size_t n, size_t s,
int (*cmp)(const void *, const void *, void *), void *ctx)
{
const size_t size = st_mult(n, s);
- char buf[1024];
+ union {
+ char buf[1024];
+ git_max_align align;
+ } u;
if (!n)
return 0;
if (!b || !cmp)
return -1;
- if (size < sizeof(buf)) {
- /* The temporary array fits on the small on-stack buffer. */
- msort_with_tmp(b, n, s, cmp, buf, ctx);
+ if (size < sizeof(u.buf)) {
+ /* The temporary array fits in the small on-stack buffer. */
+ msort_with_tmp(b, n, s, cmp, u.buf, ctx);
} else {
/* It's somewhat large, so malloc it. */
char *tmp = xmalloc(size);
diff --git a/git-compat-util.h b/git-compat-util.h
index 5fa54a7afe..28581a45c5 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -274,6 +274,17 @@ typedef unsigned long uintptr_t;
#define _ALL_SOURCE 1
#endif
+typedef union {
+ uintmax_t max_align_uintmax;
+ void *max_align_pointer;
+} git_max_align;
+
+typedef struct {
+ char unalign;
+ git_max_align aligned;
+} git_max_alignment;
+#define GIT_MAX_ALIGNMENT offsetof(git_max_alignment, aligned)
+
/* used on Mac OS X */
#ifdef PRECOMPOSE_UNICODE
#include "compat/precompose_utf8.h"
diff --git a/mem-pool.c b/mem-pool.c
index ccdcad2e3d..748eff925a 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -69,9 +69,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
struct mp_block *p = NULL;
void *r;
- /* round up to a 'uintmax_t' alignment */
- if (len & (sizeof(uintmax_t) - 1))
- len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
+ /* round up to a 'GIT_MAX_ALIGNMENT' alignment */
+ if (len & (GIT_MAX_ALIGNMENT - 1))
+ len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));
if (pool->mp_block &&
pool->mp_block->end - pool->mp_block->next_free >= len)
--
2.33.1
^ permalink raw reply related
* Re: [PATCH] bpf: allow setting mount device for bpffs
From: Daniel Borkmann @ 2022-01-05 13:24 UTC (permalink / raw)
To: Yafang Shao, ast, andrii, kafai, songliubraving, yhs,
john.fastabend, kpsingh
Cc: netdev, bpf, David Howells, viro
In-Reply-To: <20211226165649.7178-1-laoar.shao@gmail.com>
On 12/26/21 5:56 PM, Yafang Shao wrote:
> We noticed our tc ebpf tools can't start after we upgrade our in-house
> kernel version from 4.19 to 5.10. That is because of the behaviour change
> in bpffs caused by commit
> d2935de7e4fd ("vfs: Convert bpf to use the new mount API").
>
> In our tc ebpf tools, we do strict environment check. If the enrioment is
> not match, we won't allow to start the ebpf progs. One of the check is
> whether bpffs is properly mounted. The mount information of bpffs in
> kernel-4.19 and kernel-5.10 are as follows,
>
> - kenrel 4.19
> $ mount -t bpf bpffs /sys/fs/bpf
> $ mount -t bpf
> bpffs on /sys/fs/bpf type bpf (rw,relatime)
>
> - kernel 5.10
> $ mount -t bpf bpffs /sys/fs/bpf
> $ mount -t bpf
> none on /sys/fs/bpf type bpf (rw,relatime)
>
> The device name in kernel-5.10 is displayed as none instead of bpffs,
> then our environment check fails. Currently we modify the tools to adopt to
> the kernel behaviour change, but I think we'd better change the kernel code
> to keep the behavior consistent.
>
> After this change, the mount information will be displayed the same with
> the behavior in kernel-4.19, for example,
>
> $ mount -t bpf bpffs /sys/fs/bpf
> $ mount -t bpf
> bpffs on /sys/fs/bpf type bpf (rw,relatime)
>
> Fixes: d2935de7e4fd ("vfs: Convert bpf to use the new mount API")
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: David Howells <dhowells@redhat.com>
> ---
> kernel/bpf/inode.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> index 80da1db47c68..5a8b729afa91 100644
> --- a/kernel/bpf/inode.c
> +++ b/kernel/bpf/inode.c
> @@ -648,12 +648,26 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
> int opt;
>
> opt = fs_parse(fc, bpf_fs_parameters, param, &result);
> - if (opt < 0)
> + if (opt < 0) {
> /* We might like to report bad mount options here, but
> * traditionally we've ignored all mount options, so we'd
> * better continue to ignore non-existing options for bpf.
> */
> - return opt == -ENOPARAM ? 0 : opt;
> + if (opt == -ENOPARAM) {
> + if (strcmp(param->key, "source") == 0) {
> + if (param->type != fs_value_is_string)
> + return 0;
> + if (fc->source)
> + return 0;
> + fc->source = param->string;
> + param->string = NULL;
> + }
> +
> + return 0;
> + }
> +
> + return opt;
> + }
I don't think we need to open code this? Couldn't we just do something like:
[...]
opt = fs_parse(fc, bpf_fs_parameters, param, &result);
if (opt == -ENOPARAM) {
opt = vfs_parse_fs_param_source(fc, param);
if (opt != -ENOPARAM)
return opt;
return 0;
}
if (opt < 0)
return opt;
[...]
See also 0858d7da8a09 ("ramfs: fix mount source show for ramfs") where they
had a similar issue.
Thanks,
Daniel
^ permalink raw reply
* [PATCH RFC] can: isotp: convert struct tpcon::{idx,len} to unsigned int
From: Marc Kleine-Budde @ 2022-01-05 13:24 UTC (permalink / raw)
To: linux-can; +Cc: Marc Kleine-Budde, Oliver Hartkopp, syzbot+4c63f36709a642f801c5
In isotp_rcv_ff() 32 bit of data received over the network is assigned
to struct tpcon::len. Later in that function the length is checked for
the maximal supported length against MAX_MSG_LENGTH.
As struct tpcon::len is an "int" this check does not work, if the
provided length overflows the "int".
Later on struct tpcon::idx is compared against struct tpcon::len.
To fix this problem this patch converts both struct tpcon::{idx,len}
to unsigned int.
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
\# Cc: stable@vger.kernel.org
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Reported-by: syzbot+4c63f36709a642f801c5@syzkaller.appspotmail.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
net/can/isotp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/can/isotp.c b/net/can/isotp.c
index df6968b28bf4..02cbcb2ecf0d 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -119,8 +119,8 @@ enum {
};
struct tpcon {
- int idx;
- int len;
+ unsigned int idx;
+ unsigned int len;
u32 state;
u8 bs;
u8 sn;
--
2.34.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.