Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] net: ethernet: et131x: Use GFP_KERNEL instead of GFP_ATOMIC when allocating tx_ring->tcb_ring
From: Christophe JAILLET @ 2019-07-31  7:38 UTC (permalink / raw)
  To: mark.einon, davem, willy, f.fainelli, andrew
  Cc: netdev, linux-kernel, kernel-janitors, Christophe JAILLET

There is no good reason to use GFP_ATOMIC here. Other memory allocations
are performed with GFP_KERNEL (see other 'dma_alloc_coherent()' below and
'kzalloc()' in 'et131x_rx_dma_memory_alloc()')

Use GFP_KERNEL which should be enough.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
 drivers/net/ethernet/agere/et131x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/agere/et131x.c b/drivers/net/ethernet/agere/et131x.c
index e43d922f043e..174344c450af 100644
--- a/drivers/net/ethernet/agere/et131x.c
+++ b/drivers/net/ethernet/agere/et131x.c
@@ -2362,7 +2362,7 @@ static int et131x_tx_dma_memory_alloc(struct et131x_adapter *adapter)
 
 	/* Allocate memory for the TCB's (Transmit Control Block) */
 	tx_ring->tcb_ring = kcalloc(NUM_TCB, sizeof(struct tcb),
-				    GFP_ATOMIC | GFP_DMA);
+				    GFP_KERNEL | GFP_DMA);
 	if (!tx_ring->tcb_ring)
 		return -ENOMEM;
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Song Liu @ 2019-07-31  7:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, linux-security@vger.kernel.org, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, Lorenz Bauer,
	Jann Horn, Greg KH, Linux API
In-Reply-To: <CALCETrVS5FwtmTyspyg-UNoZTHes9wUNbbsvNYwQwXUUfrtaiQ@mail.gmail.com>

Hi Andy,

> On Jul 30, 2019, at 1:20 PM, Andy Lutomirski <luto@kernel.org> wrote:
> 
> On Sat, Jul 27, 2019 at 11:20 AM Song Liu <songliubraving@fb.com> wrote:
>> 
>> Hi Andy,
>> 
>>>>>> 
>>>>> 
>>>>> Well, yes. sys_bpf() is pretty powerful.
>>>>> 
>>>>> The goal of /dev/bpf is to enable special users to call sys_bpf(). In
>>>>> the meanwhile, such users should not take down the whole system easily
>>>>> by accident, e.g., with rm -rf /.
>>>> 
>>>> That’s easy, though — bpftool could learn to read /etc/bpfusers before allowing ruid != 0.
>>> 
>>> This is a great idea! fscaps + /etc/bpfusers should do the trick.
>> 
>> After some discussions and more thinking on this, I have some concerns
>> with the user space only approach.
>> 
>> IIUC, your proposal for user space only approach is like:
>> 
>> 1. bpftool (and other tools) check /etc/bpfusers and only do
>>   setuid for allowed users:
>> 
>>        int main()
>>        {
>>                if (/* uid in /etc/bpfusers */)
>>                        setuid(0);
>>                sys_bpf(...);
>>        }
>> 
>> 2. bpftool (and other tools) is installed with CAP_SETUID:
>> 
>>        setcap cap_setuid=e+p /bin/bpftool
>> 
> 
> You have this a bit backwards.  You wouldn't use CAP_SETUID.  You
> would use the setuid *mode* bit, i.e. chmod 4111 (or 4100 and use ACLs
> to further lock it down).  Or you could use setcap cap_sys_admin=p,
> although the details vary.  It woks a bit like this:
> 
> First, if you are running with elevated privilege due to SUID or
> fscaps, the kernel and glibc offer you a degree of protection: you are
> protected from ptrace(), LD_PRELOAD, etc.  You are *not* protected
> from yourself.  For example, you may be running in a context in which
> an attacker has malicious values in your environment variables, CWD,
> etc.  Do you need to carefully decide whether you are willing to run
> with elevated privilege on behalf of the user, which you learn like
> this:
> 
> uid_t real_uid = getuid();
> 
> Your decision may may depend on command-line arguments as well (i.e.
> you might want to allow tracing but not filtering, say).  Once you've
> made this decision, the details vary:
> 
> For SUID, you either continue to run with euid == 0, or you drop
> privilege using something like:
> 
> if (setresuid(real_uid, real_uid, real_uid) != 0) {
> /* optionally print an error to stderr */
> exit(1);
> }
> 
> For fscaps, if you want to be privileged, you use something like
> capng_update(); capng_apply() to set CAP_SYS_ADMIN to be effective
> when you want privilege.  If you want to be unprivileged (because
> bpfusers says so, for example), you could use capng_update() to drop
> CAP_SYS_ADMIN entirely and see if the calls still work without
> privilege.  But this is a little bit awkward, since you don't directly
> know whether the user that invoked you in the first place had
> CAP_SYS_ADMIN to begin with.
> 
> In general, SUID is a bit easier to work with.

Thanks a lot for these explanations. I learned a lot today via reading
this email and Googling. 

> 
>> This approach is not ideal, because we need to trust the tool to give
>> it CAP_SETUID. A hacked tool could easily bypass /etc/bpfusers check
>> or use other root only sys calls after setuid(0).
> 
> How?  The whole SUID mechanism is designed fairly carefully to prevent
> this.  /bin/sudo is likely to be SUID on your system, but you can't
> just "hack" it to become root.

I guess "hacked" was not the right description. I should have used 
"buggy" instead. 

SUID mechanism is great for small and solid tools, like sudo, passwd,
mount, etc. The user or sys admin could trust the tool to always do the 
right thing. On the other hand, I think SUID is not ideal for complex
tools that are under heavy development. As you mentioned, SUID doesn't 
protect against oneself. Therefore, it won't protect the system from 
buggy tool that uses SUID on something it should not. 

I think SUID is good for bpftool, because it is easy to use SUID 
correctly in bpftool. But, it is not easy to use SUID correctly in 
systemd. 

systemd and similar user space daemons use sys_bpf() to manage cgroup 
bpf programs and maps (BPF_MAP_TYPE_*CGROUP*, BPF_PROG_TYPE_CGROUP_*, 
BPF_CGROUP_*). The daemon runs in the background, and handles user 
requests to attach/detach BPF programs. If the daemon uses SUID or 
similar mechanism, it has to use euid == 0 for sys_bpf(). However, 
such daemons are usually pretty complicated. It is not easy to prove 
the daemon won't misuse euid == 0. 

Does this make sense so far? I will discuss more about cgroup in the 
next email. 

Thanks,
Song

^ permalink raw reply

* Re: [PATCH net-next v4 6/6] net: mscc: PTP Hardware Clock (PHC) support
From: antoine.tenart @ 2019-07-31  7:46 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: antoine.tenart@bootlin.com, richardcochran@gmail.com,
	davem@davemloft.net, UNGLinuxDriver@microchip.com,
	alexandre.belloni@bootlin.com, netdev@vger.kernel.org,
	thomas.petazzoni@bootlin.com, allan.nielsen@microchip.com
In-Reply-To: <2b70e0fbebf1875c51272f3005271a9aec68f00f.camel@mellanox.com>

Hello Saeed,

On Fri, Jul 26, 2019 at 08:52:10PM +0000, Saeed Mahameed wrote:
> On Thu, 2019-07-25 at 16:27 +0200, Antoine Tenart wrote:
> >  
> >  	dev->stats.tx_packets++;
> >  	dev->stats.tx_bytes += skb->len;
> > -	dev_kfree_skb_any(skb);
> > +
> > +	if (ocelot->ptp && shinfo->tx_flags & SKBTX_HW_TSTAMP &&
> > +	    port->ptp_cmd == IFH_REW_OP_TWO_STEP_PTP) {
> > +		struct ocelot_skb *oskb =
> > +			kzalloc(sizeof(struct ocelot_skb), GFP_ATOMIC);
> > +
> 
> Device drivers normally pre allocate descriptor info ring array to
> avoid dynamic atomic allocations of private data on data path.

This depends on drivers. It's a good thing to do but I don't think it's
required here for a first version, we can improve this later.

> > +		oskb->skb = skb;
> > +		oskb->id = port->ts_id % 4;
> > +		port->ts_id++;
> 
> missing skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; ?
> see 3.1 Hardware Timestamping Implementation: Device Drivers
> https://www.kernel.org/doc/Documentation/networking/timestamping.txt

Right, I'll add this.

> > +void ocelot_get_hwtimestamp(struct ocelot *ocelot, struct timespec64
> > *ts)
> > +{
> > +	unsigned long flags;
> > +	u32 val;
> > +
> > +	spin_lock_irqsave(&ocelot->ptp_clock_lock, flags);
> > +
> > +	/* Read current PTP time to get seconds */
> > +	val = ocelot_read_rix(ocelot, PTP_PIN_CFG, TOD_ACC_PIN);
> > +
> > +	val &= ~(PTP_PIN_CFG_SYNC | PTP_PIN_CFG_ACTION_MASK |
> > PTP_PIN_CFG_DOM);
> > +	val |= PTP_PIN_CFG_ACTION(PTP_PIN_ACTION_SAVE);
> > +	ocelot_write_rix(ocelot, val, PTP_PIN_CFG, TOD_ACC_PIN);
> > +	ts->tv_sec = ocelot_read_rix(ocelot, PTP_PIN_TOD_SEC_LSB,
> > TOD_ACC_PIN);
> > +
> > +	/* Read packet HW timestamp from FIFO */
> > +	val = ocelot_read(ocelot, SYS_PTP_TXSTAMP);
> > +	ts->tv_nsec = SYS_PTP_TXSTAMP_PTP_TXSTAMP(val);
> > +
> > +	/* Sec has incremented since the ts was registered */
> > +	if ((ts->tv_sec & 0x1) != !!(val &
> > SYS_PTP_TXSTAMP_PTP_TXSTAMP_SEC))
> > +		ts->tv_sec--;
> > +
> > +	spin_unlock_irqrestore(&ocelot->ptp_clock_lock, flags);
> > +}
> > +EXPORT_SYMBOL(ocelot_get_hwtimestamp);
> > +
> 
> Why EXPORT_SYMBOL? this is the last patch and it is touching one
> driver.

Because the function is used in ocelot_board.c, which can be part of
another module (the mscc/ driver provides 2 modules). You can find
other examples of this in this driver.

> > @@ -98,7 +100,11 @@ static irqreturn_t ocelot_xtr_irq_handler(int
> > irq, void *arg)
> >  		int sz, len, buf_len;
> >  		u32 ifh[4];
> >  		u32 val;
> > -		struct frame_info info;
> > +		struct frame_info info = {};
> > +		struct timespec64 ts;
> > +		struct skb_shared_hwtstamps *shhwtstamps;
> > +		u64 tod_in_ns;
> > +		u64 full_ts_in_ns;
> 
> reverse xmas tree.

OK.

> > @@ -145,6 +151,22 @@ static irqreturn_t ocelot_xtr_irq_handler(int
> > irq, void *arg)
> >  			break;
> >  		}
> >  
> > +		if (ocelot->ptp) {
> > +			ocelot_ptp_gettime64(&ocelot->ptp_info, &ts);
> > +
> > +			tod_in_ns = ktime_set(ts.tv_sec, ts.tv_nsec);
> > +			if ((tod_in_ns & 0xffffffff) < info.timestamp)
> > +				full_ts_in_ns = (((tod_in_ns >> 32) -
> > 1) << 32) |
> > +						info.timestamp;
> > +			else
> > +				full_ts_in_ns = (tod_in_ns &
> > GENMASK_ULL(63, 32)) |
> > +						info.timestamp;
> > +
> > +			shhwtstamps = skb_hwtstamps(skb);
> > +			memset(shhwtstamps, 0, sizeof(struct
> > skb_shared_hwtstamps));
> > +			shhwtstamps->hwtstamp = full_ts_in_ns;
> 
> the right way to set the timestamp is by calling: 
> skb_tstamp_tx(skb, &tstamp);

I'll fix this.

> > +static irqreturn_t ocelot_ptp_rdy_irq_handler(int irq, void *arg)
> > +{
> > +	int budget = OCELOT_PTP_QUEUE_SZ;
> > +	struct ocelot *ocelot = arg;
> > +
> > +	do {
> > +		struct skb_shared_hwtstamps shhwtstamps;
> > +		struct list_head *pos, *tmp;
> > +		struct sk_buff *skb = NULL;
> > +		struct ocelot_skb *entry;
> > +		struct ocelot_port *port;
> > +		struct timespec64 ts;
> > +		u32 val, id, txport;
> > +
> > +		/* Prevent from infinite loop */
> > +		if (unlikely(!--budget))
> > +			break;
> 
> when budget gets to 1 you break, while you still have 1 to go :)
> 
> I assume OCELOT_PTP_QUEUE_SZ > 0, just make this the loop condition and
> avoid infinite loops by design.

That's right, I'll fix it :)

> >  static int mscc_ocelot_probe(struct platform_device *pdev)
> >  {
> > -	int err, irq;
> >  	unsigned int i;
> > +	int err, irq_xtr, irq_ptp_rdy;
> >  	struct device_node *np = pdev->dev.of_node;
> >  	struct device_node *ports, *portnp;
> >  	struct ocelot *ocelot;
> 
> reverse xmas tree

Well, the xmas tree isn't there before my addition. Do you want me to
rework the whole variable definitions in this function (which is
completely unrelated to this patch)?

Thanks,
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Allan W. Nielsen @ 2019-07-31  8:01 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ido Schimmel, Nikolay Aleksandrov, Horatiu Vultur, roopa, davem,
	bridge, netdev, linux-kernel
In-Reply-To: <20190731033156.GE9523@lunn.ch>

The 07/31/2019 05:31, Andrew Lunn wrote:
> The PDF you linked to suggests this as well. But i think you will need
> to make some core changes to the bridge. At the moment, STP is a
> bridge level property. But you are going to need it to be a per-port
> option. You can then use DLR on the ring ports, and optionally STP on
> the other ports.
I think you are right. I have not looked into the details of this. Our plan was
to not implement this to begin with. This will mean that it cannot act as a
gateway until this is done. But it still has good value to be able to implement
a DLR ring node.

> > But what we are looking at here, is to offload a
> > non-aware-(DLR|MRP)-switch which happens to be placed in a network
> > with these protocols running.
> 
> So we need to think about why we are passing traffic to the CPU port,
> and under what conditions can it be blocked.
> 
> 1) The interface is not part of a bridge. In this case, we only need
> the switch to pass to the CPU port MC addresses which have been set
> via set_rx_mode().
Yes. In our HW we are using the MAC table to do this, but this is an
implementation detail.

> I think this case does not apply for what you want. You have two ports
> bridges together as part of the ring.
Correct.

> 2) The interface is part of a bridge. There are a few sub-cases
> 
> a) IGMP snooping is being performed. We can block multicast where
> there is no interest in the group. But this is limited to IP
> multicast.
Agree. And this is done today by installing an explicit offload rule to limit
the flooding of a specific group.

> b) IGMP snooping is not being used and all interfaces in the bridge
> are ports of the switch. IP Multicast can be blocked to the CPU.
Does it matter if IGMP snooping is used or not? A more general statement could
be:

- "All interfaces in the bridge are ports of the switch. IP Multicast can be
  blocked to the CPU."

This assumes that if br0 joins a IP multicast group, then the needed MAC
addresses is added via the set_rx_mode (which I'm pretty sure is the case.

Or much more aggressive (which is where we started this entire discussion):

- "All interfaces in the bridge are ports of the switch. All Multicast (except
  ff:ff:ff:ff:ff:ff, 33:33:xx:xx:xx:xx, 01:80:C2:xx:xx:xx) can be blocked to the
  CPU."

But then we are back to having the need of requesting additional L2-multicast
mac addresses to the CPU. This has been discussed, and I do not want to re-start
the discussion, just wanted to mention it to have the complete picture.

> c) IGMP snooping is not being used and there is a non-switch interface
> in the bridge. Multicast needed is needed, so it can be flooded out
> this port.
Yes. And L2-multicast also always needs to go to the CPU regardless of IGMP.

> d) set_rx_mode() has been called on the br0 interface, indicating
> there is interest in the packets on the host. They must be sent to the
> CPU so they can be delivered locally.
Yes - but today set_rx_mode is not doing anything on br0. This was one problem
we had a few days ago, when we did not forward all traffic to the CPU by
default.

When looking at these cases, I'm not sure it matters if IGMP snooping is running
or not.

Here is how I understand what you say:

              || Foreign interfaces   |  No Foreign interfaces
==============||======================|=========================
IGMP Snoop on || IP-Multicast must    |  IP-Multicast is not needed
              || go to CPU by default |  in the CPU (by default*)
--------------||----------------------|-------------------------
IGMP Snoop off|| IP-Multicast must    |  IP-Multicast is not needed 
              || go to CPU by default |  in the CPU (by default*)

* This assumes that set_rx_mode starts to do something for the br0 interface
  (which does not seem to be the case right now - see br_dev_set_multicast_list).


> e) ????
e) Another case to consider is similar to d), but with VLANs. Assume that
br0.100 joins a IP-Multicast group, which will cause set_rx_mode() to be called.
As I read the code (I have not tried it, so I might have gotten this wrong),
then we loose the VLAN information, because the mc list is not VLAN aware.

This means that if br0.100 joins a group, then we will get that group for all
VLANs.

Maybe it is better (for now) to hold on to the exiting behavioral and let all
multicast go to the CPU by default, leave br_dev_set_multicast_list empty, use
IGMP snooping to optimize the flooding, and hopefully we will have a way to
limit the L2-multicast flooding as an optimization which can be applied to
"noisy" l2 multicast streams.

> Does the Multicast MAC address being used by DLR also map to an IP
> mmulticast address?
No.

> 01:21:6C:00:00:0[123] appear to be the MAC addresses used by DLR.
Yes.

> IPv4 multicast MAC addresses are 01:00:5E:XX:XX:XX. IPv6 multicast MAC
> addresses are 33:33:XX:XX:XX:XX.
Yes, and there are no overlap.

> So one possibility here is to teach the SW bridge about non-IP
> multicast addresses. Initially the switch should forward all MAC
> multicast frames to the CPU. If the frame is not an IPv4 or IPv6
> frame, and there has not been a call to set_rx_mode() for the MAC
> address on the br0 interface, and the bridge only contains switch
> ports, switchdev could be used to block the multicast to the CPU
> frame, but forward it out all other ports of the bridge.
Close, but not exactly (due to the arguments above).

Here is how I see it:

Teach the SW bridge about non-IP multicast addresses. Initially the switch
should forward all MAC multicast frames to the CPU. Today MDB rules can be
installed (either static or dynamic by IGMP), which limit the flooding of IPv4/6
multicast streams. In the same way, we should have a way to install a rule
(FDM/ or MDB) to limit the flooding of L2 multicast frames.

If foreign interfaces (or br0 it self) is part of the destination list, then
traffic also needs to go to the CPU.

By doing this, we can for explicitly configured dst mac address:
- limit the flooding on the on the SW bridge interfaces
- limit the flooding on the on the HW bridge interfaces
- prevent them to go to the CPU if they are not needed


-- 
/Allan

^ permalink raw reply

* [PATCH 0/2] net: ag71xx: 2 small patches for 'ag71xx_rings_init()'
From: Christophe JAILLET @ 2019-07-31  8:06 UTC (permalink / raw)
  To: jcliburn, davem, chris.snook
  Cc: netdev, linux-kernel, kernel-janitors, Christophe JAILLET

Christophe JAILLET (2):
  net: ag71xx: Slighly simplify code in 'ag71xx_rings_init()'
  net: ag71xx: Use GFP_KERNEL instead of GFP_ATOMIC in
    'ag71xx_rings_init()'

 drivers/net/ethernet/atheros/ag71xx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.20.1


^ permalink raw reply

* [PATCH 1/2] net: ag71xx: Slighly simplify code in 'ag71xx_rings_init()'
From: Christophe JAILLET @ 2019-07-31  8:06 UTC (permalink / raw)
  To: jcliburn, davem, chris.snook
  Cc: netdev, linux-kernel, kernel-janitors, Christophe JAILLET
In-Reply-To: <cover.1564560130.git.christophe.jaillet@wanadoo.fr>

A few lines above, we have:
   tx_size = BIT(tx->order);

So use 'tx_size' directly to be consistent with the way 'rx->descs_cpu' and
'rx->descs_dma' are computed below.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
 drivers/net/ethernet/atheros/ag71xx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c
index 77542bd1e5bd..40a8717f51b1 100644
--- a/drivers/net/ethernet/atheros/ag71xx.c
+++ b/drivers/net/ethernet/atheros/ag71xx.c
@@ -1148,7 +1148,7 @@ static int ag71xx_rings_init(struct ag71xx *ag)
 		return -ENOMEM;
 	}
 
-	rx->buf = &tx->buf[BIT(tx->order)];
+	rx->buf = &tx->buf[tx_size];
 	rx->descs_cpu = ((void *)tx->descs_cpu) + tx_size * AG71XX_DESC_SIZE;
 	rx->descs_dma = tx->descs_dma + tx_size * AG71XX_DESC_SIZE;
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH 2/2] net: ag71xx: Use GFP_KERNEL instead of GFP_ATOMIC in 'ag71xx_rings_init()'
From: Christophe JAILLET @ 2019-07-31  8:06 UTC (permalink / raw)
  To: jcliburn, davem, chris.snook
  Cc: netdev, linux-kernel, kernel-janitors, Christophe JAILLET
In-Reply-To: <cover.1564560130.git.christophe.jaillet@wanadoo.fr>

There is no need to use GFP_ATOMIC here, GFP_KERNEL should be enough.
The 'kcalloc()' just a few lines above, already uses GFP_KERNEL.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
I've done my best to check if a spinlock can be hold when reaching this
code. Apparently it is never the case.
But double check to be sure that it is not the kcalloc that should use
GFP_ATOMIC.
---
 drivers/net/ethernet/atheros/ag71xx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c
index 40a8717f51b1..7548247455d7 100644
--- a/drivers/net/ethernet/atheros/ag71xx.c
+++ b/drivers/net/ethernet/atheros/ag71xx.c
@@ -1141,7 +1141,7 @@ static int ag71xx_rings_init(struct ag71xx *ag)
 
 	tx->descs_cpu = dma_alloc_coherent(&ag->pdev->dev,
 					   ring_size * AG71XX_DESC_SIZE,
-					   &tx->descs_dma, GFP_ATOMIC);
+					   &tx->descs_dma, GFP_KERNEL);
 	if (!tx->descs_cpu) {
 		kfree(tx->buf);
 		tx->buf = NULL;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Song Liu @ 2019-07-31  8:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Lorenz Bauer, Jann Horn, Greg KH, Linux API,
	LSM List
In-Reply-To: <CALCETrUpVMrk7aaf0trfg9AfZ4fy279uJgZH7V+gZzjFw=hUxA@mail.gmail.com>



> On Jul 30, 2019, at 1:24 PM, Andy Lutomirski <luto@kernel.org> wrote:
> 
> On Mon, Jul 29, 2019 at 10:07 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> Hi Andy,
>> 
>>> On Jul 27, 2019, at 11:20 AM, Song Liu <songliubraving@fb.com> wrote:
>>> 
>>> Hi Andy,
>>> 
>>> 

[...]

>>> 
>> 
>> I would like more comments on this.
>> 
>> Currently, bpf permission is more or less "root or nothing", which we
>> would like to change.
>> 
>> The short term goal is to separate bpf from root, in other words, it is
>> "all or nothing". Special user space utilities, such as systemd, would
>> benefit from this. Once this is implemented, systemd can call sys_bpf()
>> when it is not running as root.
> 
> As generally nasty as Linux capabilities are, this sounds like a good
> use for CAP_BPF_ADMIN.

I actually agree CAP_BPF_ADMIN makes sense. The hard part is to make 
existing tools (setcap, getcap, etc.) and libraries aware of the new CAP.

> 
> But what do you have in mind?  Isn't non-root systemd mostly just the
> user systemd session?  That should *not* have bpf() privileges until
> bpf() is improved such that you can't use it to compromise the system.

cgroup bpf is the major use case here. A less important use case is to 
run bpf selftests without being root. 

> 
>> 
>> In longer term, it may be useful to provide finer grain permission of
>> sys_bpf(). For example, sys_bpf() should be aware of containers; and
>> user may only have access to certain bpf maps. Let's call this
>> "fine grain" capability.
>> 
>> 
>> Since we are seeing new use cases every year, we will need many
>> iterations to implement the fine grain permission. I think we need an
>> API that is flexible enough to cover different types of permission
>> control.
>> 
>> For example, bpf_with_cap() can be flexible:
>> 
>>        bpf_with_cap(cmd, attr, size, perm_fd);
>> 
>> We can get different types of permission via different combinations of
>> arguments:
>> 
>>    A perm_fd to /dev/bpf gives access to all sys_bpf() commands, so
>>    this is "all or nothing" permission.
>> 
>>    A perm_fd to /sys/fs/cgroup/.../bpf.xxx would only allow some
>>    commands to this specific cgroup.
>> 
> 
> I don't see why you need to invent a whole new mechanism for this.
> The entire cgroup ecosystem outside bpf() does just fine using the
> write permission on files in cgroupfs to control access.  Why can't
> bpf() do the same thing?

It is easier to use write permission for BPF_PROG_ATTACH. But it is 
not easy to do the same for other bpf commands: BPF_PROG_LOAD and 
BPF_MAP_*. A lot of these commands don't have target concept. Maybe 
we should have target concept for all these commands. But that is a 
much bigger project. OTOH, "all or nothing" model allows all these 
commands at once.

Well, that being said, I will look more into using write permission 
in cgroupfs. 

Thanks again for all these comments and suggestions. Please let us 
know your future thoughts and insights. 

Song

^ permalink raw reply

* [PATCH net-next 0/6] flow_offload: add indr-block in nf_table_offload
From: wenxu @ 2019-07-31  8:12 UTC (permalink / raw)
  To: pablo, fw, jakub.kicinski, jiri; +Cc: netfilter-devel, netdev

From: wenxu <wenxu@ucloud.cn>

This series patch make nftables offload support the vlan and
tunnel device offload through indr-block architecture.

The first four patches mv tc indr block to flow offload and
rename to flow-indr-block.
Because the new flow-indr-block can't get the tcf_block
directly. The fifthe patch provide a callback list to get 
flow_block of each subsystem immediately when the device
register and contain a block.
The last patch make nf_tables_offload support flow-indr-block.

wenxu (6):
  cls_api: modify the tc_indr_block_ing_cmd parameters.
  cls_api: replace block with flow_block in tc_indr_block_dev
  cls_api: add flow_indr_block_call function
  flow_offload: move tc indirect block to flow offload
  flow_offload: support get flow_block immediately
  netfilter: nf_tables_offload: support indr block call

 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  10 +-
 .../net/ethernet/netronome/nfp/flower/offload.c    |  11 +-
 include/net/flow_offload.h                         |  48 ++++
 include/net/netfilter/nf_tables_offload.h          |   2 +
 include/net/pkt_cls.h                              |  35 ---
 include/net/sch_generic.h                          |   3 -
 net/core/flow_offload.c                            | 251 ++++++++++++++++++++
 net/netfilter/nf_tables_api.c                      |   7 +
 net/netfilter/nf_tables_offload.c                  | 156 +++++++++++--
 net/sched/cls_api.c                                | 255 ++++-----------------
 10 files changed, 497 insertions(+), 281 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net-next 4/6] flow_offload: move tc indirect block to flow offload
From: wenxu @ 2019-07-31  8:12 UTC (permalink / raw)
  To: pablo, fw, jakub.kicinski, jiri; +Cc: netfilter-devel, netdev
In-Reply-To: <1564560753-32603-1-git-send-email-wenxu@ucloud.cn>

From: wenxu <wenxu@ucloud.cn>

move tc indirect block to flow_offload and rename
it to flow indirect block.The nf_tables can use the
indr block architecture.

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  10 +-
 .../net/ethernet/netronome/nfp/flower/offload.c    |  11 +-
 include/net/flow_offload.h                         |  31 +++
 include/net/pkt_cls.h                              |  35 ---
 include/net/sch_generic.h                          |   3 -
 net/core/flow_offload.c                            | 218 +++++++++++++++++++
 net/sched/cls_api.c                                | 241 +--------------------
 7 files changed, 265 insertions(+), 284 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 7f747cb..074573b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -785,9 +785,9 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv,
 {
 	int err;
 
-	err = __tc_indr_block_cb_register(netdev, rpriv,
-					  mlx5e_rep_indr_setup_tc_cb,
-					  rpriv);
+	err = __flow_indr_block_cb_register(netdev, rpriv,
+					    mlx5e_rep_indr_setup_tc_cb,
+					    rpriv);
 	if (err) {
 		struct mlx5e_priv *priv = netdev_priv(rpriv->netdev);
 
@@ -800,8 +800,8 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv,
 static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv,
 					    struct net_device *netdev)
 {
-	__tc_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb,
-				      rpriv);
+	__flow_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb,
+					rpriv);
 }
 
 static int mlx5e_nic_rep_netdevice_event(struct notifier_block *nb,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index e209f15..7b490db 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -1479,16 +1479,17 @@ int nfp_flower_reg_indir_block_handler(struct nfp_app *app,
 		return NOTIFY_OK;
 
 	if (event == NETDEV_REGISTER) {
-		err = __tc_indr_block_cb_register(netdev, app,
-						  nfp_flower_indr_setup_tc_cb,
-						  app);
+		err = __flow_indr_block_cb_register(netdev, app,
+						    nfp_flower_indr_setup_tc_cb,
+						    app);
 		if (err)
 			nfp_flower_cmsg_warn(app,
 					     "Indirect block reg failed - %s\n",
 					     netdev->name);
 	} else if (event == NETDEV_UNREGISTER) {
-		__tc_indr_block_cb_unregister(netdev,
-					      nfp_flower_indr_setup_tc_cb, app);
+		__flow_indr_block_cb_unregister(netdev,
+						nfp_flower_indr_setup_tc_cb,
+						app);
 	}
 
 	return NOTIFY_OK;
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 00b9aab..c8d60a6 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -4,6 +4,7 @@
 #include <linux/kernel.h>
 #include <linux/list.h>
 #include <net/flow_dissector.h>
+#include <linux/rhashtable.h>
 
 struct flow_match {
 	struct flow_dissector	*dissector;
@@ -366,4 +367,34 @@ static inline void flow_block_init(struct flow_block *flow_block)
 	INIT_LIST_HEAD(&flow_block->cb_list);
 }
 
+typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
+				      enum tc_setup_type type, void *type_data);
+
+typedef void flow_indr_block_ing_cmd_t(struct net_device *dev,
+				       struct flow_block *flow_block,
+				       flow_indr_block_bind_cb_t *cb,
+				       void *cb_priv,
+				       enum flow_block_command command);
+
+int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+				  flow_indr_block_bind_cb_t *cb,
+				  void *cb_ident);
+
+void __flow_indr_block_cb_unregister(struct net_device *dev,
+				     flow_indr_block_bind_cb_t *cb,
+				     void *cb_ident);
+
+int flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+				flow_indr_block_bind_cb_t *cb, void *cb_ident);
+
+void flow_indr_block_cb_unregister(struct net_device *dev,
+				   flow_indr_block_bind_cb_t *cb,
+				   void *cb_ident);
+
+void flow_indr_block_call(struct flow_block *flow_block,
+			  struct net_device *dev,
+			  flow_indr_block_ing_cmd_t *ing_cmd_cb,
+			  struct flow_block_offload *bo,
+			  enum flow_block_command command);
+
 #endif /* _NET_FLOW_OFFLOAD_H */
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index e429809..0790a4e 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -70,15 +70,6 @@ static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
 	return block->q;
 }
 
-int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
-				tc_indr_block_bind_cb_t *cb, void *cb_ident);
-int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
-			      tc_indr_block_bind_cb_t *cb, void *cb_ident);
-void __tc_indr_block_cb_unregister(struct net_device *dev,
-				   tc_indr_block_bind_cb_t *cb, void *cb_ident);
-void tc_indr_block_cb_unregister(struct net_device *dev,
-				 tc_indr_block_bind_cb_t *cb, void *cb_ident);
-
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		 struct tcf_result *res, bool compat_mode);
 
@@ -137,32 +128,6 @@ void tc_setup_cb_block_unregister(struct tcf_block *block, flow_setup_cb_t *cb,
 {
 }
 
-static inline
-int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
-				tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	return 0;
-}
-
-static inline
-int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
-			      tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	return 0;
-}
-
-static inline
-void __tc_indr_block_cb_unregister(struct net_device *dev,
-				   tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-}
-
-static inline
-void tc_indr_block_cb_unregister(struct net_device *dev,
-				 tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-}
-
 static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			       struct tcf_result *res, bool compat_mode)
 {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 6b6b012..d9f359a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -23,9 +23,6 @@
 struct module;
 struct bpf_flow_keys;
 
-typedef int tc_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
-				    enum tc_setup_type type, void *type_data);
-
 struct qdisc_rate_table {
 	struct tc_ratespec rate;
 	u32		data[256];
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index d63b970..a1fdfa4 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -2,6 +2,7 @@
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <net/flow_offload.h>
+#include <linux/rtnetlink.h>
 
 struct flow_rule *flow_rule_alloc(unsigned int num_actions)
 {
@@ -280,3 +281,220 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f,
 	}
 }
 EXPORT_SYMBOL(flow_block_cb_setup_simple);
+
+static struct rhashtable indr_setup_block_ht;
+
+struct flow_indr_block_cb {
+	struct list_head list;
+	void *cb_priv;
+	flow_indr_block_bind_cb_t *cb;
+	void *cb_ident;
+};
+
+struct flow_indr_block_dev {
+	struct rhash_head ht_node;
+	struct net_device *dev;
+	unsigned int refcnt;
+	struct list_head cb_list;
+	flow_indr_block_ing_cmd_t *ing_cmd_cb;
+	struct flow_block *flow_block;
+};
+
+static const struct rhashtable_params flow_indr_setup_block_ht_params = {
+	.key_offset	= offsetof(struct flow_indr_block_dev, dev),
+	.head_offset	= offsetof(struct flow_indr_block_dev, ht_node),
+	.key_len	= sizeof(struct net_device *),
+};
+
+static struct flow_indr_block_dev *
+flow_indr_block_dev_lookup(struct net_device *dev)
+{
+	return rhashtable_lookup_fast(&indr_setup_block_ht, &dev,
+				      flow_indr_setup_block_ht_params);
+}
+
+static struct flow_indr_block_dev *
+flow_indr_block_dev_get(struct net_device *dev)
+{
+	struct flow_indr_block_dev *indr_dev;
+
+	indr_dev = flow_indr_block_dev_lookup(dev);
+	if (indr_dev)
+		goto inc_ref;
+
+	indr_dev = kzalloc(sizeof(*indr_dev), GFP_KERNEL);
+	if (!indr_dev)
+		return NULL;
+
+	INIT_LIST_HEAD(&indr_dev->cb_list);
+	indr_dev->dev = dev;
+	if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
+				   flow_indr_setup_block_ht_params)) {
+		kfree(indr_dev);
+		return NULL;
+	}
+
+inc_ref:
+	indr_dev->refcnt++;
+	return indr_dev;
+}
+
+static void flow_indr_block_dev_put(struct flow_indr_block_dev *indr_dev)
+{
+	if (--indr_dev->refcnt)
+		return;
+
+	rhashtable_remove_fast(&indr_setup_block_ht, &indr_dev->ht_node,
+			       flow_indr_setup_block_ht_params);
+	kfree(indr_dev);
+}
+
+static struct flow_indr_block_cb *
+flow_indr_block_cb_lookup(struct flow_indr_block_dev *indr_dev,
+			  flow_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+	struct flow_indr_block_cb *indr_block_cb;
+
+	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
+		if (indr_block_cb->cb == cb &&
+		    indr_block_cb->cb_ident == cb_ident)
+			return indr_block_cb;
+	return NULL;
+}
+
+static struct flow_indr_block_cb *
+flow_indr_block_cb_add(struct flow_indr_block_dev *indr_dev, void *cb_priv,
+		       flow_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+	struct flow_indr_block_cb *indr_block_cb;
+
+	indr_block_cb = flow_indr_block_cb_lookup(indr_dev, cb, cb_ident);
+	if (indr_block_cb)
+		return ERR_PTR(-EEXIST);
+
+	indr_block_cb = kzalloc(sizeof(*indr_block_cb), GFP_KERNEL);
+	if (!indr_block_cb)
+		return ERR_PTR(-ENOMEM);
+
+	indr_block_cb->cb_priv = cb_priv;
+	indr_block_cb->cb = cb;
+	indr_block_cb->cb_ident = cb_ident;
+	list_add(&indr_block_cb->list, &indr_dev->cb_list);
+
+	return indr_block_cb;
+}
+
+static void flow_indr_block_cb_del(struct flow_indr_block_cb *indr_block_cb)
+{
+	list_del(&indr_block_cb->list);
+	kfree(indr_block_cb);
+}
+
+int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+				  flow_indr_block_bind_cb_t *cb,
+				  void *cb_ident)
+{
+	struct flow_indr_block_cb *indr_block_cb;
+	struct flow_indr_block_dev *indr_dev;
+	int err;
+
+	indr_dev = flow_indr_block_dev_get(dev);
+	if (!indr_dev)
+		return -ENOMEM;
+
+	indr_block_cb = flow_indr_block_cb_add(indr_dev, cb_priv, cb, cb_ident);
+	err = PTR_ERR_OR_ZERO(indr_block_cb);
+	if (err)
+		goto err_dev_put;
+
+	if (indr_dev->ing_cmd_cb)
+		indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block,
+				     indr_block_cb->cb, indr_block_cb->cb_priv,
+				     FLOW_BLOCK_BIND);
+
+	return 0;
+
+err_dev_put:
+	flow_indr_block_dev_put(indr_dev);
+	return err;
+}
+EXPORT_SYMBOL_GPL(__flow_indr_block_cb_register);
+
+int flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+				flow_indr_block_bind_cb_t *cb,
+				void *cb_ident)
+{
+	int err;
+
+	rtnl_lock();
+	err = __flow_indr_block_cb_register(dev, cb_priv, cb, cb_ident);
+	rtnl_unlock();
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(flow_indr_block_cb_register);
+
+void __flow_indr_block_cb_unregister(struct net_device *dev,
+				     flow_indr_block_bind_cb_t *cb,
+				     void *cb_ident)
+{
+	struct flow_indr_block_cb *indr_block_cb;
+	struct flow_indr_block_dev *indr_dev;
+
+	indr_dev = flow_indr_block_dev_lookup(dev);
+	if (!indr_dev)
+		return;
+
+	indr_block_cb = flow_indr_block_cb_lookup(indr_dev, cb, cb_ident);
+	if (!indr_block_cb)
+		return;
+
+	/* Send unbind message if required to free any block cbs. */
+	if (indr_dev->ing_cmd_cb)
+		indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block,
+				     indr_block_cb->cb, indr_block_cb->cb_priv,
+				     FLOW_BLOCK_UNBIND);
+
+	flow_indr_block_cb_del(indr_block_cb);
+	flow_indr_block_dev_put(indr_dev);
+}
+EXPORT_SYMBOL_GPL(__flow_indr_block_cb_unregister);
+
+void flow_indr_block_cb_unregister(struct net_device *dev,
+				   flow_indr_block_bind_cb_t *cb,
+				   void *cb_ident)
+{
+	rtnl_lock();
+	__flow_indr_block_cb_unregister(dev, cb, cb_ident);
+	rtnl_unlock();
+}
+EXPORT_SYMBOL_GPL(flow_indr_block_cb_unregister);
+
+void flow_indr_block_call(struct flow_block *flow_block,
+			  struct net_device *dev,
+			  flow_indr_block_ing_cmd_t *ing_cmd_cb,
+			  struct flow_block_offload *bo,
+			  enum flow_block_command command)
+{
+	struct flow_indr_block_cb *indr_block_cb;
+	struct flow_indr_block_dev *indr_dev;
+
+	indr_dev = flow_indr_block_dev_lookup(dev);
+	if (!indr_dev)
+		return;
+
+	indr_dev->flow_block = command == FLOW_BLOCK_BIND ? flow_block : NULL;
+	indr_dev->ing_cmd_cb = command == FLOW_BLOCK_BIND ? ing_cmd_cb : NULL;
+
+	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
+		indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
+				  bo);
+}
+EXPORT_SYMBOL_GPL(flow_indr_block_call);
+
+static int __init init_flow_indr_rhashtable(void)
+{
+	return rhashtable_init(&indr_setup_block_ht,
+			       &flow_indr_setup_block_ht_params);
+}
+subsys_initcall(init_flow_indr_rhashtable);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 617b098..bd5e591 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -37,6 +37,7 @@
 #include <net/tc_act/tc_skbedit.h>
 #include <net/tc_act/tc_ct.h>
 #include <net/tc_act/tc_mpls.h>
+#include <net/flow_offload.h>
 
 extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
 
@@ -545,149 +546,12 @@ static void tcf_chain_flush(struct tcf_chain *chain, bool rtnl_held)
 	}
 }
 
-static struct tcf_block *tc_dev_ingress_block(struct net_device *dev)
-{
-	const struct Qdisc_class_ops *cops;
-	struct Qdisc *qdisc;
-
-	if (!dev_ingress_queue(dev))
-		return NULL;
-
-	qdisc = dev_ingress_queue(dev)->qdisc_sleeping;
-	if (!qdisc)
-		return NULL;
-
-	cops = qdisc->ops->cl_ops;
-	if (!cops)
-		return NULL;
-
-	if (!cops->tcf_block)
-		return NULL;
-
-	return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL);
-}
-
-static struct rhashtable indr_setup_block_ht;
-
-struct tc_indr_block_dev {
-	struct rhash_head ht_node;
-	struct net_device *dev;
-	unsigned int refcnt;
-	struct list_head cb_list;
-	struct flow_block *flow_block;
-};
-
-struct tc_indr_block_cb {
-	struct list_head list;
-	void *cb_priv;
-	tc_indr_block_bind_cb_t *cb;
-	void *cb_ident;
-};
-
-static const struct rhashtable_params tc_indr_setup_block_ht_params = {
-	.key_offset	= offsetof(struct tc_indr_block_dev, dev),
-	.head_offset	= offsetof(struct tc_indr_block_dev, ht_node),
-	.key_len	= sizeof(struct net_device *),
-};
-
-static struct tc_indr_block_dev *
-tc_indr_block_dev_lookup(struct net_device *dev)
-{
-	return rhashtable_lookup_fast(&indr_setup_block_ht, &dev,
-				      tc_indr_setup_block_ht_params);
-}
-
-static void tc_indr_get_default_block(struct tc_indr_block_dev *indr_dev)
-{
-	struct tcf_block *block = tc_dev_ingress_block(indr_dev->dev);
-
-	if (block)
-		indr_dev->flow_block = &block->flow_block;
-}
-
-static struct tc_indr_block_dev *tc_indr_block_dev_get(struct net_device *dev)
-{
-	struct tc_indr_block_dev *indr_dev;
-
-	indr_dev = tc_indr_block_dev_lookup(dev);
-	if (indr_dev)
-		goto inc_ref;
-
-	indr_dev = kzalloc(sizeof(*indr_dev), GFP_KERNEL);
-	if (!indr_dev)
-		return NULL;
-
-	INIT_LIST_HEAD(&indr_dev->cb_list);
-	indr_dev->dev = dev;
-	tc_indr_get_default_block(indr_dev);
-	if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
-				   tc_indr_setup_block_ht_params)) {
-		kfree(indr_dev);
-		return NULL;
-	}
-
-inc_ref:
-	indr_dev->refcnt++;
-	return indr_dev;
-}
-
-static void tc_indr_block_dev_put(struct tc_indr_block_dev *indr_dev)
-{
-	if (--indr_dev->refcnt)
-		return;
-
-	rhashtable_remove_fast(&indr_setup_block_ht, &indr_dev->ht_node,
-			       tc_indr_setup_block_ht_params);
-	kfree(indr_dev);
-}
-
-static struct tc_indr_block_cb *
-tc_indr_block_cb_lookup(struct tc_indr_block_dev *indr_dev,
-			tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	struct tc_indr_block_cb *indr_block_cb;
-
-	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
-		if (indr_block_cb->cb == cb &&
-		    indr_block_cb->cb_ident == cb_ident)
-			return indr_block_cb;
-	return NULL;
-}
-
-static struct tc_indr_block_cb *
-tc_indr_block_cb_add(struct tc_indr_block_dev *indr_dev, void *cb_priv,
-		     tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	struct tc_indr_block_cb *indr_block_cb;
-
-	indr_block_cb = tc_indr_block_cb_lookup(indr_dev, cb, cb_ident);
-	if (indr_block_cb)
-		return ERR_PTR(-EEXIST);
-
-	indr_block_cb = kzalloc(sizeof(*indr_block_cb), GFP_KERNEL);
-	if (!indr_block_cb)
-		return ERR_PTR(-ENOMEM);
-
-	indr_block_cb->cb_priv = cb_priv;
-	indr_block_cb->cb = cb;
-	indr_block_cb->cb_ident = cb_ident;
-	list_add(&indr_block_cb->list, &indr_dev->cb_list);
-
-	return indr_block_cb;
-}
-
-static void tc_indr_block_cb_del(struct tc_indr_block_cb *indr_block_cb)
-{
-	list_del(&indr_block_cb->list);
-	kfree(indr_block_cb);
-}
-
 static int tcf_block_setup(struct tcf_block *block,
 			   struct flow_block_offload *bo);
 
 static void tc_indr_block_ing_cmd(struct net_device *dev,
 				  struct flow_block *flow_block,
-				  tc_indr_block_bind_cb_t *cb,
+				  flow_indr_block_bind_cb_t *cb,
 				  void *cb_priv,
 				  enum flow_block_command command)
 {
@@ -712,96 +576,6 @@ static void tc_indr_block_ing_cmd(struct net_device *dev,
 	tcf_block_setup(block, &bo);
 }
 
-int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
-				tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	struct tc_indr_block_cb *indr_block_cb;
-	struct tc_indr_block_dev *indr_dev;
-	int err;
-
-	indr_dev = tc_indr_block_dev_get(dev);
-	if (!indr_dev)
-		return -ENOMEM;
-
-	indr_block_cb = tc_indr_block_cb_add(indr_dev, cb_priv, cb, cb_ident);
-	err = PTR_ERR_OR_ZERO(indr_block_cb);
-	if (err)
-		goto err_dev_put;
-
-	tc_indr_block_ing_cmd(dev, indr_dev->flow_block, cb, cb_priv,
-			      FLOW_BLOCK_BIND);
-	return 0;
-
-err_dev_put:
-	tc_indr_block_dev_put(indr_dev);
-	return err;
-}
-EXPORT_SYMBOL_GPL(__tc_indr_block_cb_register);
-
-int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
-			      tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	int err;
-
-	rtnl_lock();
-	err = __tc_indr_block_cb_register(dev, cb_priv, cb, cb_ident);
-	rtnl_unlock();
-
-	return err;
-}
-EXPORT_SYMBOL_GPL(tc_indr_block_cb_register);
-
-void __tc_indr_block_cb_unregister(struct net_device *dev,
-				   tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	struct tc_indr_block_cb *indr_block_cb;
-	struct tc_indr_block_dev *indr_dev;
-
-	indr_dev = tc_indr_block_dev_lookup(dev);
-	if (!indr_dev)
-		return;
-
-	indr_block_cb = tc_indr_block_cb_lookup(indr_dev, indr_block_cb->cb,
-						indr_block_cb->cb_ident);
-	if (!indr_block_cb)
-		return;
-
-	/* Send unbind message if required to free any block cbs. */
-	tc_indr_block_ing_cmd(dev, indr_dev->flow_block, indr_block_cb->cb,
-			      indr_block_cb->cb_priv, FLOW_BLOCK_UNBIND);
-	tc_indr_block_cb_del(indr_block_cb);
-	tc_indr_block_dev_put(indr_dev);
-}
-EXPORT_SYMBOL_GPL(__tc_indr_block_cb_unregister);
-
-void tc_indr_block_cb_unregister(struct net_device *dev,
-				 tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-	rtnl_lock();
-	__tc_indr_block_cb_unregister(dev, cb, cb_ident);
-	rtnl_unlock();
-}
-EXPORT_SYMBOL_GPL(tc_indr_block_cb_unregister);
-
-static void flow_indr_block_call(struct flow_block *flow_block,
-				 struct net_device *dev,
-				 struct flow_block_offload *bo,
-				 enum flow_block_command command)
-{
-	struct tc_indr_block_cb *indr_block_cb;
-	struct tc_indr_block_dev *indr_dev;
-
-	indr_dev = tc_indr_block_dev_lookup(dev);
-	if (!indr_dev)
-		return;
-
-	indr_dev->flow_block = command == FLOW_BLOCK_BIND ? flow_block : NULL;
-
-	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
-		indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
-				  bo);
-}
-
 static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 			       struct tcf_block_ext_info *ei,
 			       enum flow_block_command command,
@@ -817,7 +591,9 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 	};
 	INIT_LIST_HEAD(&bo.cb_list);
 
-	flow_indr_block_call(&block->flow_block, dev, &bo, command);
+	flow_indr_block_call(&block->flow_block, dev, tc_indr_block_ing_cmd,
+			     &bo, command);
+
 	tcf_block_setup(block, &bo);
 }
 
@@ -3382,11 +3158,6 @@ static int __init tc_filter_init(void)
 	if (err)
 		goto err_register_pernet_subsys;
 
-	err = rhashtable_init(&indr_setup_block_ht,
-			      &tc_indr_setup_block_ht_params);
-	if (err)
-		goto err_rhash_setup_block_ht;
-
 	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL,
 		      RTNL_FLAG_DOIT_UNLOCKED);
 	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL,
@@ -3400,8 +3171,6 @@ static int __init tc_filter_init(void)
 
 	return 0;
 
-err_rhash_setup_block_ht:
-	unregister_pernet_subsys(&tcf_net_ops);
 err_register_pernet_subsys:
 	destroy_workqueue(tc_filter_wq);
 	return err;
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next 6/6] netfilter: nf_tables_offload: support indr block call
From: wenxu @ 2019-07-31  8:12 UTC (permalink / raw)
  To: pablo, fw, jakub.kicinski, jiri; +Cc: netfilter-devel, netdev
In-Reply-To: <1564560753-32603-1-git-send-email-wenxu@ucloud.cn>

From: wenxu <wenxu@ucloud.cn>

nftable support indr-block call. It makes nftable an offload vlan
and tunnel device.

nft add table netdev firewall
nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; }
nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0
nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; }
nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
 include/net/netfilter/nf_tables_offload.h |   2 +
 net/netfilter/nf_tables_api.c             |   7 ++
 net/netfilter/nf_tables_offload.c         | 156 +++++++++++++++++++++++++-----
 3 files changed, 141 insertions(+), 24 deletions(-)

diff --git a/include/net/netfilter/nf_tables_offload.h b/include/net/netfilter/nf_tables_offload.h
index 3196663..ac69087 100644
--- a/include/net/netfilter/nf_tables_offload.h
+++ b/include/net/netfilter/nf_tables_offload.h
@@ -63,6 +63,8 @@ struct nft_flow_rule {
 struct nft_flow_rule *nft_flow_rule_create(const struct nft_rule *rule);
 void nft_flow_rule_destroy(struct nft_flow_rule *flow);
 int nft_flow_rule_offload_commit(struct net *net);
+bool nft_indr_get_default_block(struct net_device *dev,
+				struct flow_indr_block_info *info);
 
 #define NFT_OFFLOAD_MATCH(__key, __base, __field, __len, __reg)		\
 	(__reg)->base_offset	=					\
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 605a7cf..6a1d0b2 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -7593,6 +7593,11 @@ static void __net_exit nf_tables_exit_net(struct net *net)
 	.exit	= nf_tables_exit_net,
 };
 
+static struct flow_indr_get_block_entry get_block_entry = {
+	.get_block_cb = nft_indr_get_default_block,
+	.list = LIST_HEAD_INIT(get_block_entry.list),
+};
+
 static int __init nf_tables_module_init(void)
 {
 	int err;
@@ -7624,6 +7629,7 @@ static int __init nf_tables_module_init(void)
 		goto err5;
 
 	nft_chain_route_init();
+	flow_indr_add_default_block_cb(&get_block_entry);
 	return err;
 err5:
 	rhltable_destroy(&nft_objname_ht);
@@ -7640,6 +7646,7 @@ static int __init nf_tables_module_init(void)
 
 static void __exit nf_tables_module_exit(void)
 {
+	flow_indr_del_default_block_cb(&get_block_entry);
 	nfnetlink_subsys_unregister(&nf_tables_subsys);
 	unregister_netdevice_notifier(&nf_tables_flowtable_notifier);
 	nft_chain_filter_fini();
diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c
index 64f5fd5..59c9629 100644
--- a/net/netfilter/nf_tables_offload.c
+++ b/net/netfilter/nf_tables_offload.c
@@ -171,24 +171,114 @@ static int nft_flow_offload_unbind(struct flow_block_offload *bo,
 	return 0;
 }
 
+static int nft_block_setup(struct nft_base_chain *basechain,
+			   struct flow_block_offload *bo,
+			   enum flow_block_command cmd)
+{
+	int err;
+
+	switch (cmd) {
+	case FLOW_BLOCK_BIND:
+		err = nft_flow_offload_bind(bo, basechain);
+		break;
+	case FLOW_BLOCK_UNBIND:
+		err = nft_flow_offload_unbind(bo, basechain);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		err = -EOPNOTSUPP;
+	}
+
+	return err;
+}
+
+static int nft_block_offload_cmd(struct nft_base_chain *chain,
+				 struct net_device *dev,
+				 enum flow_block_command cmd)
+{
+	struct netlink_ext_ack extack = {};
+	struct flow_block_offload bo = {};
+	int err;
+
+	bo.net = dev_net(dev);
+	bo.block = &chain->flow_block;
+	bo.command = cmd;
+	bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+	bo.extack = &extack;
+	INIT_LIST_HEAD(&bo.cb_list);
+
+	err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
+	if (err < 0)
+		return err;
+
+	return nft_block_setup(chain, &bo, cmd);
+}
+
+static void nft_indr_block_ing_cmd(struct net_device *dev,
+				   struct flow_block *flow_block,
+				   flow_indr_block_bind_cb_t *cb,
+				   void *cb_priv,
+				   enum flow_block_command cmd)
+{
+	struct netlink_ext_ack extack = {};
+	struct flow_block_offload bo = {};
+	struct nft_base_chain *chain;
+
+	if (flow_block)
+		return;
+
+	chain = container_of(flow_block, struct nft_base_chain, flow_block);
+
+	bo.net = dev_net(dev);
+	bo.block = flow_block;
+	bo.command = cmd;
+	bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+	bo.extack = &extack;
+	INIT_LIST_HEAD(&bo.cb_list);
+
+	cb(dev, cb_priv, TC_SETUP_BLOCK, &bo);
+
+	nft_block_setup(chain, &bo, cmd);
+}
+
+static int nft_indr_block_offload_cmd(struct nft_base_chain *chain,
+				      struct net_device *dev,
+				      enum flow_block_command cmd)
+{
+	struct flow_block_offload bo = {};
+	struct netlink_ext_ack extack = {};
+
+	bo.net = dev_net(dev);
+	bo.block = &chain->flow_block;
+	bo.command = cmd;
+	bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+	bo.extack = &extack;
+	INIT_LIST_HEAD(&bo.cb_list);
+
+	flow_indr_block_call(&chain->flow_block, dev, nft_indr_block_ing_cmd,
+			     &bo, cmd);
+
+	if (list_empty(&bo.cb_list))
+		return -EOPNOTSUPP;
+
+	return nft_block_setup(chain, &bo, cmd);
+}
+
 #define FLOW_SETUP_BLOCK TC_SETUP_BLOCK
 
 static int nft_flow_offload_chain(struct nft_trans *trans,
 				  enum flow_block_command cmd)
 {
 	struct nft_chain *chain = trans->ctx.chain;
-	struct netlink_ext_ack extack = {};
-	struct flow_block_offload bo = {};
 	struct nft_base_chain *basechain;
 	struct net_device *dev;
-	int err;
 
 	if (!nft_is_base_chain(chain))
 		return -EOPNOTSUPP;
 
 	basechain = nft_base_chain(chain);
 	dev = basechain->ops.dev;
-	if (!dev || !dev->netdev_ops->ndo_setup_tc)
+	if (!dev)
 		return -EOPNOTSUPP;
 
 	/* Only default policy to accept is supported for now. */
@@ -197,26 +287,10 @@ static int nft_flow_offload_chain(struct nft_trans *trans,
 	    nft_trans_chain_policy(trans) != NF_ACCEPT)
 		return -EOPNOTSUPP;
 
-	bo.command = cmd;
-	bo.block = &basechain->flow_block;
-	bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
-	bo.extack = &extack;
-	INIT_LIST_HEAD(&bo.cb_list);
-
-	err = dev->netdev_ops->ndo_setup_tc(dev, FLOW_SETUP_BLOCK, &bo);
-	if (err < 0)
-		return err;
-
-	switch (cmd) {
-	case FLOW_BLOCK_BIND:
-		err = nft_flow_offload_bind(&bo, basechain);
-		break;
-	case FLOW_BLOCK_UNBIND:
-		err = nft_flow_offload_unbind(&bo, basechain);
-		break;
-	}
-
-	return err;
+	if (dev->netdev_ops->ndo_setup_tc)
+		return nft_block_offload_cmd(basechain, dev, cmd);
+	else
+		return nft_indr_block_offload_cmd(basechain, dev, cmd);
 }
 
 int nft_flow_rule_offload_commit(struct net *net)
@@ -266,3 +340,37 @@ int nft_flow_rule_offload_commit(struct net *net)
 
 	return err;
 }
+
+bool nft_indr_get_default_block(struct net_device *dev,
+				struct flow_indr_block_info *info)
+{
+	struct net *net = dev_net(dev);
+	const struct nft_table *table;
+	const struct nft_chain *chain;
+
+	rcu_read_lock();
+
+	list_for_each_entry_rcu(table, &net->nft.tables, list) {
+		if (table->family != NFPROTO_NETDEV)
+			continue;
+
+		list_for_each_entry_rcu(chain, &table->chains, list) {
+			if (nft_is_base_chain(chain)) {
+				struct nft_base_chain *basechain;
+
+				basechain = nft_base_chain(chain);
+				if (!strncmp(basechain->dev_name, dev->name,
+					     IFNAMSIZ)) {
+					info->flow_block = &basechain->flow_block;
+					info->ing_cmd_cb = nft_indr_block_ing_cmd;
+					rcu_read_unlock();
+					return true;
+				}
+			}
+		}
+	}
+
+	rcu_read_unlock();
+
+	return false;
+}
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next 2/6] cls_api: replace block with flow_block in tc_indr_block_dev
From: wenxu @ 2019-07-31  8:12 UTC (permalink / raw)
  To: pablo, fw, jakub.kicinski, jiri; +Cc: netfilter-devel, netdev
In-Reply-To: <1564560753-32603-1-git-send-email-wenxu@ucloud.cn>

From: wenxu <wenxu@ucloud.cn>

This patch make tc_indr_block_dev can separate from tc subsystem

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
 net/sched/cls_api.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2e3b58d..f9643fa 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -574,7 +574,7 @@ struct tc_indr_block_dev {
 	struct net_device *dev;
 	unsigned int refcnt;
 	struct list_head cb_list;
-	struct tcf_block *block;
+	struct flow_block *flow_block;
 };
 
 struct tc_indr_block_cb {
@@ -597,6 +597,14 @@ struct tc_indr_block_cb {
 				      tc_indr_setup_block_ht_params);
 }
 
+static void tc_indr_get_default_block(struct tc_indr_block_dev *indr_dev)
+{
+	struct tcf_block *block = tc_dev_ingress_block(indr_dev->dev);
+
+	if (block)
+		indr_dev->flow_block = &block->flow_block;
+}
+
 static struct tc_indr_block_dev *tc_indr_block_dev_get(struct net_device *dev)
 {
 	struct tc_indr_block_dev *indr_dev;
@@ -611,7 +619,7 @@ static struct tc_indr_block_dev *tc_indr_block_dev_get(struct net_device *dev)
 
 	INIT_LIST_HEAD(&indr_dev->cb_list);
 	indr_dev->dev = dev;
-	indr_dev->block = tc_dev_ingress_block(dev);
+	tc_indr_get_default_block(indr_dev);
 	if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
 				   tc_indr_setup_block_ht_params)) {
 		kfree(indr_dev);
@@ -678,11 +686,14 @@ static int tcf_block_setup(struct tcf_block *block,
 			   struct flow_block_offload *bo);
 
 static void tc_indr_block_ing_cmd(struct net_device *dev,
-				  struct tcf_block *block,
+				  struct flow_block *flow_block,
 				  tc_indr_block_bind_cb_t *cb,
 				  void *cb_priv,
 				  enum flow_block_command command)
 {
+	struct tcf_block *block = flow_block ? container_of(flow_block,
+							    struct tcf_block,
+							    flow_block) : NULL;
 	struct flow_block_offload bo = {
 		.command	= command,
 		.binder_type	= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
@@ -694,7 +705,7 @@ static void tc_indr_block_ing_cmd(struct net_device *dev,
 	if (!block)
 		return;
 
-	bo.block = &block->flow_block;
+	bo.block = flow_block;
 
 	cb(dev, cb_priv, TC_SETUP_BLOCK, &bo);
 
@@ -717,7 +728,7 @@ int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
 	if (err)
 		goto err_dev_put;
 
-	tc_indr_block_ing_cmd(dev, indr_dev->block, cb, cb_priv,
+	tc_indr_block_ing_cmd(dev, indr_dev->flow_block, cb, cb_priv,
 			      FLOW_BLOCK_BIND);
 	return 0;
 
@@ -750,13 +761,14 @@ void __tc_indr_block_cb_unregister(struct net_device *dev,
 	if (!indr_dev)
 		return;
 
-	indr_block_cb = tc_indr_block_cb_lookup(indr_dev, cb, cb_ident);
+	indr_block_cb = tc_indr_block_cb_lookup(indr_dev, indr_block_cb->cb,
+						indr_block_cb->cb_ident);
 	if (!indr_block_cb)
 		return;
 
 	/* Send unbind message if required to free any block cbs. */
-	tc_indr_block_ing_cmd(dev, indr_dev->block, cb, indr_block_cb->cb_priv,
-			      FLOW_BLOCK_UNBIND);
+	tc_indr_block_ing_cmd(dev, indr_dev->flow_block, indr_block_cb->cb,
+			      indr_block_cb->cb_priv, FLOW_BLOCK_UNBIND);
 	tc_indr_block_cb_del(indr_block_cb);
 	tc_indr_block_dev_put(indr_dev);
 }
@@ -792,7 +804,8 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 	if (!indr_dev)
 		return;
 
-	indr_dev->block = command == FLOW_BLOCK_BIND ? block : NULL;
+	indr_dev->flow_block = command == FLOW_BLOCK_BIND ?
+					  &block->flow_block : NULL;
 
 	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
 		indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next 1/6] cls_api: modify the tc_indr_block_ing_cmd parameters.
From: wenxu @ 2019-07-31  8:12 UTC (permalink / raw)
  To: pablo, fw, jakub.kicinski, jiri; +Cc: netfilter-devel, netdev
In-Reply-To: <1564560753-32603-1-git-send-email-wenxu@ucloud.cn>

From: wenxu <wenxu@ucloud.cn>

This patch make tc_indr_block_ing_cmd can't access struct
tc_indr_block_dev and tc_indr_block_cb.

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
 net/sched/cls_api.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 3565d9a..2e3b58d 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -677,26 +677,28 @@ static void tc_indr_block_cb_del(struct tc_indr_block_cb *indr_block_cb)
 static int tcf_block_setup(struct tcf_block *block,
 			   struct flow_block_offload *bo);
 
-static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev,
-				  struct tc_indr_block_cb *indr_block_cb,
+static void tc_indr_block_ing_cmd(struct net_device *dev,
+				  struct tcf_block *block,
+				  tc_indr_block_bind_cb_t *cb,
+				  void *cb_priv,
 				  enum flow_block_command command)
 {
 	struct flow_block_offload bo = {
 		.command	= command,
 		.binder_type	= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
-		.net		= dev_net(indr_dev->dev),
-		.block_shared	= tcf_block_non_null_shared(indr_dev->block),
+		.net		= dev_net(dev),
+		.block_shared	= tcf_block_non_null_shared(block),
 	};
 	INIT_LIST_HEAD(&bo.cb_list);
 
-	if (!indr_dev->block)
+	if (!block)
 		return;
 
-	bo.block = &indr_dev->block->flow_block;
+	bo.block = &block->flow_block;
 
-	indr_block_cb->cb(indr_dev->dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
-			  &bo);
-	tcf_block_setup(indr_dev->block, &bo);
+	cb(dev, cb_priv, TC_SETUP_BLOCK, &bo);
+
+	tcf_block_setup(block, &bo);
 }
 
 int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
@@ -715,7 +717,8 @@ int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
 	if (err)
 		goto err_dev_put;
 
-	tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_BIND);
+	tc_indr_block_ing_cmd(dev, indr_dev->block, cb, cb_priv,
+			      FLOW_BLOCK_BIND);
 	return 0;
 
 err_dev_put:
@@ -752,7 +755,8 @@ void __tc_indr_block_cb_unregister(struct net_device *dev,
 		return;
 
 	/* Send unbind message if required to free any block cbs. */
-	tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_UNBIND);
+	tc_indr_block_ing_cmd(dev, indr_dev->block, cb, indr_block_cb->cb_priv,
+			      FLOW_BLOCK_UNBIND);
 	tc_indr_block_cb_del(indr_block_cb);
 	tc_indr_block_dev_put(indr_dev);
 }
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next 3/6] cls_api: add flow_indr_block_call function
From: wenxu @ 2019-07-31  8:12 UTC (permalink / raw)
  To: pablo, fw, jakub.kicinski, jiri; +Cc: netfilter-devel, netdev
In-Reply-To: <1564560753-32603-1-git-send-email-wenxu@ucloud.cn>

From: wenxu <wenxu@ucloud.cn>

This patch make indr_block_call don't access struct tc_indr_block_cb
and tc_indr_block_dev directly

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
 net/sched/cls_api.c | 33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f9643fa..617b098 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -783,13 +783,30 @@ void tc_indr_block_cb_unregister(struct net_device *dev,
 }
 EXPORT_SYMBOL_GPL(tc_indr_block_cb_unregister);
 
+static void flow_indr_block_call(struct flow_block *flow_block,
+				 struct net_device *dev,
+				 struct flow_block_offload *bo,
+				 enum flow_block_command command)
+{
+	struct tc_indr_block_cb *indr_block_cb;
+	struct tc_indr_block_dev *indr_dev;
+
+	indr_dev = tc_indr_block_dev_lookup(dev);
+	if (!indr_dev)
+		return;
+
+	indr_dev->flow_block = command == FLOW_BLOCK_BIND ? flow_block : NULL;
+
+	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
+		indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
+				  bo);
+}
+
 static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 			       struct tcf_block_ext_info *ei,
 			       enum flow_block_command command,
 			       struct netlink_ext_ack *extack)
 {
-	struct tc_indr_block_cb *indr_block_cb;
-	struct tc_indr_block_dev *indr_dev;
 	struct flow_block_offload bo = {
 		.command	= command,
 		.binder_type	= ei->binder_type,
@@ -800,17 +817,7 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 	};
 	INIT_LIST_HEAD(&bo.cb_list);
 
-	indr_dev = tc_indr_block_dev_lookup(dev);
-	if (!indr_dev)
-		return;
-
-	indr_dev->flow_block = command == FLOW_BLOCK_BIND ?
-					  &block->flow_block : NULL;
-
-	list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
-		indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
-				  &bo);
-
+	flow_indr_block_call(&block->flow_block, dev, &bo, command);
 	tcf_block_setup(block, &bo);
 }
 
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next 5/6] flow_offload: support get flow_block immediately
From: wenxu @ 2019-07-31  8:12 UTC (permalink / raw)
  To: pablo, fw, jakub.kicinski, jiri; +Cc: netfilter-devel, netdev
In-Reply-To: <1564560753-32603-1-git-send-email-wenxu@ucloud.cn>

From: wenxu <wenxu@ucloud.cn>

The new flow-indr-block can't get the tcf_block
directly. It provide a callback list to find the flow_block immediately
when the device register and contain a ingress block.

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
 include/net/flow_offload.h | 17 +++++++++++++++++
 net/core/flow_offload.c    | 33 +++++++++++++++++++++++++++++++++
 net/sched/cls_api.c        | 44 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 94 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index c8d60a6..db04e3f 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -376,6 +376,23 @@ typedef void flow_indr_block_ing_cmd_t(struct net_device *dev,
 				       void *cb_priv,
 				       enum flow_block_command command);
 
+struct flow_indr_block_info {
+	struct flow_block *flow_block;
+	flow_indr_block_ing_cmd_t *ing_cmd_cb;
+};
+
+typedef bool flow_indr_get_default_block_t(struct net_device *dev,
+					    struct flow_indr_block_info *info);
+
+struct flow_indr_get_block_entry {
+	flow_indr_get_default_block_t *get_block_cb;
+	struct list_head	list;
+};
+
+void flow_indr_add_default_block_cb(struct flow_indr_get_block_entry *entry);
+
+void flow_indr_del_default_block_cb(struct flow_indr_get_block_entry *entry);
+
 int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
 				  flow_indr_block_bind_cb_t *cb,
 				  void *cb_ident);
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index a1fdfa4..8ff7a75b 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -282,6 +282,8 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f,
 }
 EXPORT_SYMBOL(flow_block_cb_setup_simple);
 
+static LIST_HEAD(get_default_block_cb_list);
+
 static struct rhashtable indr_setup_block_ht;
 
 struct flow_indr_block_cb {
@@ -313,6 +315,24 @@ struct flow_indr_block_dev {
 				      flow_indr_setup_block_ht_params);
 }
 
+static void flow_get_default_block(struct flow_indr_block_dev *indr_dev)
+{
+	struct flow_indr_get_block_entry *entry_cb;
+	struct flow_indr_block_info info;
+
+	rcu_read_lock();
+
+	list_for_each_entry_rcu(entry_cb, &get_default_block_cb_list, list) {
+		if (entry_cb->get_block_cb(indr_dev->dev, &info)) {
+			indr_dev->flow_block = info.flow_block;
+			indr_dev->ing_cmd_cb = info.ing_cmd_cb;
+			break;
+		}
+	}
+
+	rcu_read_unlock();
+}
+
 static struct flow_indr_block_dev *
 flow_indr_block_dev_get(struct net_device *dev)
 {
@@ -328,6 +348,7 @@ struct flow_indr_block_dev {
 
 	INIT_LIST_HEAD(&indr_dev->cb_list);
 	indr_dev->dev = dev;
+	flow_get_default_block(indr_dev);
 	if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
 				   flow_indr_setup_block_ht_params)) {
 		kfree(indr_dev);
@@ -492,6 +513,18 @@ void flow_indr_block_call(struct flow_block *flow_block,
 }
 EXPORT_SYMBOL_GPL(flow_indr_block_call);
 
+void flow_indr_add_default_block_cb(struct flow_indr_get_block_entry *entry)
+{
+	list_add_tail_rcu(&entry->list, &get_default_block_cb_list);
+}
+EXPORT_SYMBOL_GPL(flow_indr_add_default_block_cb);
+
+void flow_indr_del_default_block_cb(struct flow_indr_get_block_entry *entry)
+{
+	list_del_rcu(&entry->list);
+}
+EXPORT_SYMBOL_GPL(flow_indr_del_default_block_cb);
+
 static int __init init_flow_indr_rhashtable(void)
 {
 	return rhashtable_init(&indr_setup_block_ht,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index bd5e591..8bf918c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -576,6 +576,43 @@ static void tc_indr_block_ing_cmd(struct net_device *dev,
 	tcf_block_setup(block, &bo);
 }
 
+static struct tcf_block *tc_dev_ingress_block(struct net_device *dev)
+{
+	const struct Qdisc_class_ops *cops;
+	struct Qdisc *qdisc;
+
+	if (!dev_ingress_queue(dev))
+		return NULL;
+
+	qdisc = dev_ingress_queue(dev)->qdisc_sleeping;
+	if (!qdisc)
+		return NULL;
+
+	cops = qdisc->ops->cl_ops;
+	if (!cops)
+		return NULL;
+
+	if (!cops->tcf_block)
+		return NULL;
+
+	return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL);
+}
+
+static bool tc_indr_get_default_block(struct net_device *dev,
+				      struct flow_indr_block_info *info)
+{
+	struct tcf_block *block = tc_dev_ingress_block(dev);
+
+	if (block) {
+		info->flow_block = &block->flow_block;
+		info->ing_cmd_cb = tc_indr_block_ing_cmd;
+
+		return true;
+	}
+
+	return false;
+}
+
 static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 			       struct tcf_block_ext_info *ei,
 			       enum flow_block_command command,
@@ -3146,6 +3183,11 @@ static void __net_exit tcf_net_exit(struct net *net)
 	.size = sizeof(struct tcf_net),
 };
 
+static struct flow_indr_get_block_entry get_block_entry = {
+	.get_block_cb = tc_indr_get_default_block,
+	.list = LIST_HEAD_INIT(get_block_entry.list),
+};
+
 static int __init tc_filter_init(void)
 {
 	int err;
@@ -3158,6 +3200,8 @@ static int __init tc_filter_init(void)
 	if (err)
 		goto err_register_pernet_subsys;
 
+	flow_indr_add_default_block_cb(&get_block_entry);
+
 	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL,
 		      RTNL_FLAG_DOIT_UNLOCKED);
 	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL,
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH] net/ethernet/qlogic/qed: force the string buffer NULL-terminated
From: Wang Xiayang @ 2019-07-31  8:15 UTC (permalink / raw)
  Cc: aelior, GR-everest-linux-l2, netdev, Wang Xiayang

strncpy() does not ensure NULL-termination when the input string
size equals to the destination buffer size 30.
The output string is passed to qed_int_deassertion_aeu_bit()
which calls DP_INFO() and relies NULL-termination.

Use strlcpy instead. The other conditional branch above strncpy()
needs no fix as snprintf() ensures NULL-termination.

This issue is identified by a Coccinelle script.

Signed-off-by: Wang Xiayang <xywang.sjtu@sjtu.edu.cn>
---
 drivers/net/ethernet/qlogic/qed/qed_int.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_int.c b/drivers/net/ethernet/qlogic/qed/qed_int.c
index 4e8118a08654..9f5113639eaf 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_int.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_int.c
@@ -1093,7 +1093,7 @@ static int qed_int_deassertion(struct qed_hwfn  *p_hwfn,
 						snprintf(bit_name, 30,
 							 p_aeu->bit_name, num);
 					else
-						strncpy(bit_name,
+						strlcpy(bit_name,
 							p_aeu->bit_name, 30);
 
 					/* We now need to pass bitmask in its
-- 
2.11.0


^ permalink raw reply related

* [PATCH net-next v2 0/6] net: dsa: mv88e6xxx: add support for MV88E6220
From: Hubert Feurstein @ 2019-07-31  8:23 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S. Miller, Rasmus Villemoes

This patch series adds support for the MV88E6220 chip to the mv88e6xxx driver.
The MV88E6220 is almost the same as MV88E6250 except that the ports 2-4 are
not routed to pins.

Furthermore, PTP support is added to the MV88E6250 family.

v2:
 - insert all 6220 entries in correct numerical order
 - introduce invalid_port_mask
 - move ptp_cc_mult* to ptp_ops and restored original ptp_adjfine code
 - added Andrews Reviewed-By to patch 2 and 4

Hubert Feurstein (6):
  net: dsa: mv88e6xxx: add support for MV88E6220
  dt-bindings: net: dsa: marvell: add 6220 model to the 6250 family
  net: dsa: mv88e6xxx: introduce invalid_port_mask in mv88e6xxx_info
  net: dsa: mv88e6xxx: setup message port is not supported in the 6250
    familiy
  net: dsa: mv88e6xxx: order ptp structs numerically ascending
  net: dsa: mv88e6xxx: add PTP support for MV88E6250 family

 drivers/net/dsa/mv88e6xxx/chip.c |  49 ++++++++++++--
 drivers/net/dsa/mv88e6xxx/chip.h |  15 +++++
 drivers/net/dsa/mv88e6xxx/ptp.c  | 106 ++++++++++++++++++++++++-------
 drivers/net/dsa/mv88e6xxx/ptp.h  |   6 +-
 4 files changed, 147 insertions(+), 29 deletions(-)

-- 
2.22.0


^ permalink raw reply

* [PATCH net-next v2 5/6] net: dsa: mv88e6xxx: order ptp structs numerically ascending
From: Hubert Feurstein @ 2019-07-31  8:23 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S. Miller, Rasmus Villemoes
In-Reply-To: <20190731082351.3157-1-h.feurstein@gmail.com>

As it is done for all the other structs within this driver.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/ptp.c | 32 ++++++++++++++++----------------
 drivers/net/dsa/mv88e6xxx/ptp.h |  4 ++--
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/ptp.c b/drivers/net/dsa/mv88e6xxx/ptp.c
index 768d256f7c9f..a1ff182c8737 100644
--- a/drivers/net/dsa/mv88e6xxx/ptp.c
+++ b/drivers/net/dsa/mv88e6xxx/ptp.c
@@ -310,6 +310,22 @@ static int mv88e6352_ptp_verify(struct ptp_clock_info *ptp, unsigned int pin,
 	return 0;
 }
 
+const struct mv88e6xxx_ptp_ops mv88e6165_ptp_ops = {
+	.clock_read = mv88e6165_ptp_clock_read,
+	.global_enable = mv88e6165_global_enable,
+	.global_disable = mv88e6165_global_disable,
+	.arr0_sts_reg = MV88E6165_PORT_PTP_ARR0_STS,
+	.arr1_sts_reg = MV88E6165_PORT_PTP_ARR1_STS,
+	.dep_sts_reg = MV88E6165_PORT_PTP_DEP_STS,
+	.rx_filters = (1 << HWTSTAMP_FILTER_NONE) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_L2_EVENT) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_L2_SYNC) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_EVENT) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_SYNC) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_DELAY_REQ),
+};
+
 const struct mv88e6xxx_ptp_ops mv88e6352_ptp_ops = {
 	.clock_read = mv88e6352_ptp_clock_read,
 	.ptp_enable = mv88e6352_ptp_enable,
@@ -333,22 +349,6 @@ const struct mv88e6xxx_ptp_ops mv88e6352_ptp_ops = {
 		(1 << HWTSTAMP_FILTER_PTP_V2_DELAY_REQ),
 };
 
-const struct mv88e6xxx_ptp_ops mv88e6165_ptp_ops = {
-	.clock_read = mv88e6165_ptp_clock_read,
-	.global_enable = mv88e6165_global_enable,
-	.global_disable = mv88e6165_global_disable,
-	.arr0_sts_reg = MV88E6165_PORT_PTP_ARR0_STS,
-	.arr1_sts_reg = MV88E6165_PORT_PTP_ARR1_STS,
-	.dep_sts_reg = MV88E6165_PORT_PTP_DEP_STS,
-	.rx_filters = (1 << HWTSTAMP_FILTER_NONE) |
-		(1 << HWTSTAMP_FILTER_PTP_V2_L2_EVENT) |
-		(1 << HWTSTAMP_FILTER_PTP_V2_L2_SYNC) |
-		(1 << HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ) |
-		(1 << HWTSTAMP_FILTER_PTP_V2_EVENT) |
-		(1 << HWTSTAMP_FILTER_PTP_V2_SYNC) |
-		(1 << HWTSTAMP_FILTER_PTP_V2_DELAY_REQ),
-};
-
 static u64 mv88e6xxx_ptp_clock_read(const struct cyclecounter *cc)
 {
 	struct mv88e6xxx_chip *chip = cc_to_chip(cc);
diff --git a/drivers/net/dsa/mv88e6xxx/ptp.h b/drivers/net/dsa/mv88e6xxx/ptp.h
index 0a1f8de8f062..58cbd21d58f6 100644
--- a/drivers/net/dsa/mv88e6xxx/ptp.h
+++ b/drivers/net/dsa/mv88e6xxx/ptp.h
@@ -148,8 +148,8 @@ void mv88e6xxx_ptp_free(struct mv88e6xxx_chip *chip);
 #define ptp_to_chip(ptp) container_of(ptp, struct mv88e6xxx_chip,	\
 				      ptp_clock_info)
 
-extern const struct mv88e6xxx_ptp_ops mv88e6352_ptp_ops;
 extern const struct mv88e6xxx_ptp_ops mv88e6165_ptp_ops;
+extern const struct mv88e6xxx_ptp_ops mv88e6352_ptp_ops;
 
 #else /* !CONFIG_NET_DSA_MV88E6XXX_PTP */
 
@@ -167,8 +167,8 @@ static inline void mv88e6xxx_ptp_free(struct mv88e6xxx_chip *chip)
 {
 }
 
-static const struct mv88e6xxx_ptp_ops mv88e6352_ptp_ops = {};
 static const struct mv88e6xxx_ptp_ops mv88e6165_ptp_ops = {};
+static const struct mv88e6xxx_ptp_ops mv88e6352_ptp_ops = {};
 
 #endif /* CONFIG_NET_DSA_MV88E6XXX_PTP */
 
-- 
2.22.0


^ permalink raw reply related

* [PATCH net-next v2 6/6] net: dsa: mv88e6xxx: add PTP support for MV88E6250 family
From: Hubert Feurstein @ 2019-07-31  8:23 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Richard Cochran, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, David S. Miller, Rasmus Villemoes
In-Reply-To: <20190731082351.3157-1-h.feurstein@gmail.com>

This adds PTP support for the MV88E6250 family.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c |  4 ++
 drivers/net/dsa/mv88e6xxx/chip.h |  4 ++
 drivers/net/dsa/mv88e6xxx/ptp.c  | 79 +++++++++++++++++++++++++++-----
 drivers/net/dsa/mv88e6xxx/ptp.h  |  2 +
 4 files changed, 78 insertions(+), 11 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 75cfa1f2060b..4a79e389eb8b 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3522,6 +3522,8 @@ static const struct mv88e6xxx_ops mv88e6250_ops = {
 	.reset = mv88e6250_g1_reset,
 	.vtu_getnext = mv88e6250_g1_vtu_getnext,
 	.vtu_loadpurge = mv88e6250_g1_vtu_loadpurge,
+	.avb_ops = &mv88e6352_avb_ops,
+	.ptp_ops = &mv88e6250_ptp_ops,
 	.phylink_validate = mv88e6065_phylink_validate,
 };
 
@@ -4318,6 +4320,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
 		.atu_move_port_mask = 0xf,
 		.dual_chip = true,
 		.tag_protocol = DSA_TAG_PROTO_DSA,
+		.ptp_support = true,
 		.ops = &mv88e6250_ops,
 	},
 
@@ -4363,6 +4366,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
 		.atu_move_port_mask = 0xf,
 		.dual_chip = true,
 		.tag_protocol = DSA_TAG_PROTO_DSA,
+		.ptp_support = true,
 		.ops = &mv88e6250_ops,
 	},
 
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index e7b88e9643b9..8c6d3c906197 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -539,6 +539,10 @@ struct mv88e6xxx_ptp_ops {
 	int arr1_sts_reg;
 	int dep_sts_reg;
 	u32 rx_filters;
+	u32 cc_shift;
+	u32 cc_mult;
+	u32 cc_mult_num;
+	u32 cc_mult_dem;
 };
 
 #define STATS_TYPE_PORT		BIT(0)
diff --git a/drivers/net/dsa/mv88e6xxx/ptp.c b/drivers/net/dsa/mv88e6xxx/ptp.c
index a1ff182c8737..073cbd0bb91b 100644
--- a/drivers/net/dsa/mv88e6xxx/ptp.c
+++ b/drivers/net/dsa/mv88e6xxx/ptp.c
@@ -15,11 +15,31 @@
 #include "hwtstamp.h"
 #include "ptp.h"
 
-/* Raw timestamps are in units of 8-ns clock periods. */
-#define CC_SHIFT	28
-#define CC_MULT		(8 << CC_SHIFT)
-#define CC_MULT_NUM	(1 << 9)
-#define CC_MULT_DEM	15625ULL
+#define MV88E6XXX_MAX_ADJ_PPB	1000000
+
+/* Family MV88E6250:
+ * Raw timestamps are in units of 10-ns clock periods.
+ *
+ * clkadj = scaled_ppm * 10*2^28 / (10^6 * 2^16)
+ * simplifies to
+ * clkadj = scaled_ppm * 2^7 / 5^5
+ */
+#define MV88E6250_CC_SHIFT	28
+#define MV88E6250_CC_MULT	(10 << MV88E6250_CC_SHIFT)
+#define MV88E6250_CC_MULT_NUM	(1 << 7)
+#define MV88E6250_CC_MULT_DEM	3125ULL
+
+/* Other families:
+ * Raw timestamps are in units of 8-ns clock periods.
+ *
+ * clkadj = scaled_ppm * 8*2^28 / (10^6 * 2^16)
+ * simplifies to
+ * clkadj = scaled_ppm * 2^9 / 5^6
+ */
+#define MV88E6XXX_CC_SHIFT	28
+#define MV88E6XXX_CC_MULT	(8 << MV88E6XXX_CC_SHIFT)
+#define MV88E6XXX_CC_MULT_NUM	(1 << 9)
+#define MV88E6XXX_CC_MULT_DEM	15625ULL
 
 #define TAI_EVENT_WORK_INTERVAL msecs_to_jiffies(100)
 
@@ -179,6 +199,7 @@ static void mv88e6352_tai_event_work(struct work_struct *ugly)
 static int mv88e6xxx_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
 {
 	struct mv88e6xxx_chip *chip = ptp_to_chip(ptp);
+	const struct mv88e6xxx_ptp_ops *ptp_ops = chip->info->ops->ptp_ops;
 	int neg_adj = 0;
 	u32 diff, mult;
 	u64 adj;
@@ -187,10 +208,11 @@ static int mv88e6xxx_ptp_adjfine(struct ptp_clock_info *ptp, long scaled_ppm)
 		neg_adj = 1;
 		scaled_ppm = -scaled_ppm;
 	}
-	mult = CC_MULT;
-	adj = CC_MULT_NUM;
+
+	mult = ptp_ops->cc_mult;
+	adj = ptp_ops->cc_mult_num;
 	adj *= scaled_ppm;
-	diff = div_u64(adj, CC_MULT_DEM);
+	diff = div_u64(adj, ptp_ops->cc_mult_dem);
 
 	mv88e6xxx_reg_lock(chip);
 
@@ -324,6 +346,37 @@ const struct mv88e6xxx_ptp_ops mv88e6165_ptp_ops = {
 		(1 << HWTSTAMP_FILTER_PTP_V2_EVENT) |
 		(1 << HWTSTAMP_FILTER_PTP_V2_SYNC) |
 		(1 << HWTSTAMP_FILTER_PTP_V2_DELAY_REQ),
+	.cc_shift = MV88E6XXX_CC_SHIFT,
+	.cc_mult = MV88E6XXX_CC_MULT,
+	.cc_mult_num = MV88E6XXX_CC_MULT_NUM,
+	.cc_mult_dem = MV88E6XXX_CC_MULT_DEM,
+};
+
+const struct mv88e6xxx_ptp_ops mv88e6250_ptp_ops = {
+	.clock_read = mv88e6352_ptp_clock_read,
+	.ptp_enable = mv88e6352_ptp_enable,
+	.ptp_verify = mv88e6352_ptp_verify,
+	.event_work = mv88e6352_tai_event_work,
+	.port_enable = mv88e6352_hwtstamp_port_enable,
+	.port_disable = mv88e6352_hwtstamp_port_disable,
+	.n_ext_ts = 1,
+	.arr0_sts_reg = MV88E6XXX_PORT_PTP_ARR0_STS,
+	.arr1_sts_reg = MV88E6XXX_PORT_PTP_ARR1_STS,
+	.dep_sts_reg = MV88E6XXX_PORT_PTP_DEP_STS,
+	.rx_filters = (1 << HWTSTAMP_FILTER_NONE) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_L4_EVENT) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_L4_SYNC) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_L2_EVENT) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_L2_SYNC) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_EVENT) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_SYNC) |
+		(1 << HWTSTAMP_FILTER_PTP_V2_DELAY_REQ),
+	.cc_shift = MV88E6250_CC_SHIFT,
+	.cc_mult = MV88E6250_CC_MULT,
+	.cc_mult_num = MV88E6250_CC_MULT_NUM,
+	.cc_mult_dem = MV88E6250_CC_MULT_DEM,
 };
 
 const struct mv88e6xxx_ptp_ops mv88e6352_ptp_ops = {
@@ -347,6 +400,10 @@ const struct mv88e6xxx_ptp_ops mv88e6352_ptp_ops = {
 		(1 << HWTSTAMP_FILTER_PTP_V2_EVENT) |
 		(1 << HWTSTAMP_FILTER_PTP_V2_SYNC) |
 		(1 << HWTSTAMP_FILTER_PTP_V2_DELAY_REQ),
+	.cc_shift = MV88E6XXX_CC_SHIFT,
+	.cc_mult = MV88E6XXX_CC_MULT,
+	.cc_mult_num = MV88E6XXX_CC_MULT_NUM,
+	.cc_mult_dem = MV88E6XXX_CC_MULT_DEM,
 };
 
 static u64 mv88e6xxx_ptp_clock_read(const struct cyclecounter *cc)
@@ -384,8 +441,8 @@ int mv88e6xxx_ptp_setup(struct mv88e6xxx_chip *chip)
 	memset(&chip->tstamp_cc, 0, sizeof(chip->tstamp_cc));
 	chip->tstamp_cc.read	= mv88e6xxx_ptp_clock_read;
 	chip->tstamp_cc.mask	= CYCLECOUNTER_MASK(32);
-	chip->tstamp_cc.mult	= CC_MULT;
-	chip->tstamp_cc.shift	= CC_SHIFT;
+	chip->tstamp_cc.mult	= ptp_ops->cc_mult;
+	chip->tstamp_cc.shift	= ptp_ops->cc_shift;
 
 	timecounter_init(&chip->tstamp_tc, &chip->tstamp_cc,
 			 ktime_to_ns(ktime_get_real()));
@@ -397,7 +454,6 @@ int mv88e6xxx_ptp_setup(struct mv88e6xxx_chip *chip)
 	chip->ptp_clock_info.owner = THIS_MODULE;
 	snprintf(chip->ptp_clock_info.name, sizeof(chip->ptp_clock_info.name),
 		 "%s", dev_name(chip->dev));
-	chip->ptp_clock_info.max_adj	= 1000000;
 
 	chip->ptp_clock_info.n_ext_ts	= ptp_ops->n_ext_ts;
 	chip->ptp_clock_info.n_per_out	= 0;
@@ -413,6 +469,7 @@ int mv88e6xxx_ptp_setup(struct mv88e6xxx_chip *chip)
 	}
 	chip->ptp_clock_info.pin_config = chip->pin_config;
 
+	chip->ptp_clock_info.max_adj    = MV88E6XXX_MAX_ADJ_PPB;
 	chip->ptp_clock_info.adjfine	= mv88e6xxx_ptp_adjfine;
 	chip->ptp_clock_info.adjtime	= mv88e6xxx_ptp_adjtime;
 	chip->ptp_clock_info.gettime64	= mv88e6xxx_ptp_gettime;
diff --git a/drivers/net/dsa/mv88e6xxx/ptp.h b/drivers/net/dsa/mv88e6xxx/ptp.h
index 58cbd21d58f6..269d5d16a466 100644
--- a/drivers/net/dsa/mv88e6xxx/ptp.h
+++ b/drivers/net/dsa/mv88e6xxx/ptp.h
@@ -149,6 +149,7 @@ void mv88e6xxx_ptp_free(struct mv88e6xxx_chip *chip);
 				      ptp_clock_info)
 
 extern const struct mv88e6xxx_ptp_ops mv88e6165_ptp_ops;
+extern const struct mv88e6xxx_ptp_ops mv88e6250_ptp_ops;
 extern const struct mv88e6xxx_ptp_ops mv88e6352_ptp_ops;
 
 #else /* !CONFIG_NET_DSA_MV88E6XXX_PTP */
@@ -168,6 +169,7 @@ static inline void mv88e6xxx_ptp_free(struct mv88e6xxx_chip *chip)
 }
 
 static const struct mv88e6xxx_ptp_ops mv88e6165_ptp_ops = {};
+static const struct mv88e6xxx_ptp_ops mv88e6250_ptp_ops = {};
 static const struct mv88e6xxx_ptp_ops mv88e6352_ptp_ops = {};
 
 #endif /* CONFIG_NET_DSA_MV88E6XXX_PTP */
-- 
2.22.0


^ permalink raw reply related

* [PATCH net-next v2 4/6] net: dsa: mv88e6xxx: setup message port is not supported in the 6250 familiy
From: Hubert Feurstein @ 2019-07-31  8:23 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S. Miller, Rasmus Villemoes
In-Reply-To: <20190731082351.3157-1-h.feurstein@gmail.com>

The MV88E6250 family doesn't support the MV88E6XXX_PORT_CTL1_MESSAGE_PORT
bit.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 34 +++++++++++++++++++++++++++++---
 drivers/net/dsa/mv88e6xxx/chip.h |  1 +
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 9fdcc21f0858..75cfa1f2060b 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2252,9 +2252,11 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
 			return err;
 	}
 
-	err = mv88e6xxx_setup_message_port(chip, port);
-	if (err)
-		return err;
+	if (chip->info->ops->port_setup_message_port) {
+		err = chip->info->ops->port_setup_message_port(chip, port);
+		if (err)
+			return err;
+	}
 
 	/* Port based VLAN map: give each port the same default address
 	 * database, and allow bidirectional communication between the
@@ -2805,6 +2807,7 @@ static const struct mv88e6xxx_ops mv88e6085_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6185_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6xxx_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -2839,6 +2842,7 @@ static const struct mv88e6xxx_ops mv88e6095_ops = {
 	.port_set_upstream_port = mv88e6095_port_set_upstream_port,
 	.port_link_state = mv88e6185_port_link_state,
 	.port_get_cmode = mv88e6185_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6xxx_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -2875,6 +2879,7 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6185_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6xxx_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -2909,6 +2914,7 @@ static const struct mv88e6xxx_ops mv88e6123_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6185_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6320_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -2946,6 +2952,7 @@ static const struct mv88e6xxx_ops mv88e6131_ops = {
 	.port_set_pause = mv88e6185_port_set_pause,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6185_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6xxx_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -2990,6 +2997,7 @@ static const struct mv88e6xxx_ops mv88e6141_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6390_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6320_stats_get_sset_count,
@@ -3030,6 +3038,7 @@ static const struct mv88e6xxx_ops mv88e6161_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6185_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6xxx_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -3063,6 +3072,7 @@ static const struct mv88e6xxx_ops mv88e6165_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6185_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6xxx_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -3104,6 +3114,7 @@ static const struct mv88e6xxx_ops mv88e6171_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6320_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -3145,6 +3156,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6320_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -3187,6 +3199,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6320_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -3228,6 +3241,7 @@ static const struct mv88e6xxx_ops mv88e6176_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6320_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -3266,6 +3280,7 @@ static const struct mv88e6xxx_ops mv88e6185_ops = {
 	.port_set_pause = mv88e6185_port_set_pause,
 	.port_link_state = mv88e6185_port_link_state,
 	.port_get_cmode = mv88e6185_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6xxx_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -3308,6 +3323,7 @@ static const struct mv88e6xxx_ops mv88e6190_ops = {
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
 	.port_set_cmode = mv88e6390_port_set_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6390_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6390_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6320_stats_get_sset_count,
@@ -3353,6 +3369,7 @@ static const struct mv88e6xxx_ops mv88e6190x_ops = {
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
 	.port_set_cmode = mv88e6390x_port_set_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6390_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6390_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6320_stats_get_sset_count,
@@ -3398,6 +3415,7 @@ static const struct mv88e6xxx_ops mv88e6191_ops = {
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
 	.port_set_cmode = mv88e6390_port_set_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6390_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6390_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6320_stats_get_sset_count,
@@ -3445,6 +3463,7 @@ static const struct mv88e6xxx_ops mv88e6240_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6320_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -3530,6 +3549,7 @@ static const struct mv88e6xxx_ops mv88e6290_ops = {
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
 	.port_set_cmode = mv88e6390_port_set_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6390_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6390_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6320_stats_get_sset_count,
@@ -3577,6 +3597,7 @@ static const struct mv88e6xxx_ops mv88e6320_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6320_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6320_stats_get_sset_count,
@@ -3620,6 +3641,7 @@ static const struct mv88e6xxx_ops mv88e6321_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6320_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6320_stats_get_sset_count,
@@ -3663,6 +3685,7 @@ static const struct mv88e6xxx_ops mv88e6341_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6390_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6320_stats_get_sset_count,
@@ -3706,6 +3729,7 @@ static const struct mv88e6xxx_ops mv88e6350_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6320_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -3745,6 +3769,7 @@ static const struct mv88e6xxx_ops mv88e6351_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6320_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -3788,6 +3813,7 @@ static const struct mv88e6xxx_ops mv88e6352_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6320_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
@@ -3840,6 +3866,7 @@ static const struct mv88e6xxx_ops mv88e6390_ops = {
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
 	.port_set_cmode = mv88e6390_port_set_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6390_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6390_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6320_stats_get_sset_count,
@@ -3889,6 +3916,7 @@ static const struct mv88e6xxx_ops mv88e6390x_ops = {
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
 	.port_set_cmode = mv88e6390x_port_set_cmode,
+	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6390_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6390_g1_stats_set_histogram,
 	.stats_get_sset_count = mv88e6320_stats_get_sset_count,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 359d258d7151..e7b88e9643b9 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -395,6 +395,7 @@ struct mv88e6xxx_ops {
 				u8 out);
 	int (*port_disable_learn_limit)(struct mv88e6xxx_chip *chip, int port);
 	int (*port_disable_pri_override)(struct mv88e6xxx_chip *chip, int port);
+	int (*port_setup_message_port)(struct mv88e6xxx_chip *chip, int port);
 
 	/* CMODE control what PHY mode the MAC will use, eg. SGMII, RGMII, etc.
 	 * Some chips allow this to be configured on specific ports.
-- 
2.22.0


^ permalink raw reply related

* [PATCH net-next v2 3/6] net: dsa: mv88e6xxx: introduce invalid_port_mask in mv88e6xxx_info
From: Hubert Feurstein @ 2019-07-31  8:23 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S. Miller, Rasmus Villemoes
In-Reply-To: <20190731082351.3157-1-h.feurstein@gmail.com>

With this it is possible to mark certain chip ports as invalid. This is
required for example for the MV88E6220 (which is in general a MV88E6250
with 7 ports) but the ports 2-4 are not routed to pins.

If a user configures an invalid port, an error is returned.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c |  9 +++++++++
 drivers/net/dsa/mv88e6xxx/chip.h | 10 ++++++++++
 2 files changed, 19 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index c8c176da0f1c..9fdcc21f0858 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2469,6 +2469,14 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
 
 	/* Setup Switch Port Registers */
 	for (i = 0; i < mv88e6xxx_num_ports(chip); i++) {
+		/* Prevent the use of an invalid port. */
+		if (mv88e6xxx_is_invalid_port(chip, i) &&
+		    !dsa_is_unused_port(ds, i)) {
+			dev_err(chip->dev, "port %d is invalid\n", i);
+			err = -EINVAL;
+			goto unlock;
+		}
+
 		if (dsa_is_unused_port(ds, i)) {
 			err = mv88e6xxx_port_set_state(chip, i,
 						       BR_STATE_DISABLED);
@@ -4270,6 +4278,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
 		 */
 		.num_ports = 7,
 		.num_internal_phys = 2,
+		.invalid_port_mask = BIT(2) | BIT(3) | BIT(4),
 		.max_vid = 4095,
 		.port_base_addr = 0x08,
 		.phy_base_addr = 0x00,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 2cc508a1cc32..359d258d7151 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -106,6 +106,11 @@ struct mv88e6xxx_info {
 	unsigned int g2_irqs;
 	bool pvt;
 
+	/* Mark certain ports as invalid. This is required for example for the
+	 * MV88E6220 (which is in general a MV88E6250 with 7 ports) but the
+	 * ports 2-4 are not routet to pins.
+	 */
+	unsigned int invalid_port_mask;
 	/* Multi-chip Addressing Mode.
 	 * Some chips respond to only 2 registers of its own SMI device address
 	 * when it is non-zero, and use indirect access to internal registers.
@@ -571,6 +576,11 @@ static inline unsigned int mv88e6xxx_num_gpio(struct mv88e6xxx_chip *chip)
 	return chip->info->num_gpio;
 }
 
+static inline bool mv88e6xxx_is_invalid_port(struct mv88e6xxx_chip *chip, int port)
+{
+	return (chip->info->invalid_port_mask & BIT(port)) != 0;
+}
+
 int mv88e6xxx_read(struct mv88e6xxx_chip *chip, int addr, int reg, u16 *val);
 int mv88e6xxx_write(struct mv88e6xxx_chip *chip, int addr, int reg, u16 val);
 int mv88e6xxx_update(struct mv88e6xxx_chip *chip, int addr, int reg,
-- 
2.22.0


^ permalink raw reply related

* [PATCH net-next v2 1/6] net: dsa: mv88e6xxx: add support for MV88E6220
From: Hubert Feurstein @ 2019-07-31  8:23 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S. Miller, Rasmus Villemoes
In-Reply-To: <20190731082351.3157-1-h.feurstein@gmail.com>

The MV88E6220 is almost the same as MV88E6250 except that the ports 2-4 are
not routed to pins. So the usable ports are 0, 1, 5 and 6.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 25 +++++++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/chip.h |  3 ++-
 drivers/net/dsa/mv88e6xxx/port.h |  1 +
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 6b17cd961d06..c8c176da0f1c 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -4259,6 +4259,31 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
 		.ops = &mv88e6191_ops,
 	},
 
+	[MV88E6220] = {
+		.prod_num = MV88E6XXX_PORT_SWITCH_ID_PROD_6220,
+		.family = MV88E6XXX_FAMILY_6250,
+		.name = "Marvell 88E6220",
+		.num_databases = 64,
+
+		/* Ports 2-4 are not routed to pins
+		 * => usable ports 0, 1, 5, 6
+		 */
+		.num_ports = 7,
+		.num_internal_phys = 2,
+		.max_vid = 4095,
+		.port_base_addr = 0x08,
+		.phy_base_addr = 0x00,
+		.global1_addr = 0x0f,
+		.global2_addr = 0x07,
+		.age_time_coeff = 15000,
+		.g1_irqs = 9,
+		.g2_irqs = 10,
+		.atu_move_port_mask = 0xf,
+		.dual_chip = true,
+		.tag_protocol = DSA_TAG_PROTO_DSA,
+		.ops = &mv88e6250_ops,
+	},
+
 	[MV88E6240] = {
 		.prod_num = MV88E6XXX_PORT_SWITCH_ID_PROD_6240,
 		.family = MV88E6XXX_FAMILY_6352,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 4646e46d47f2..2cc508a1cc32 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -57,6 +57,7 @@ enum mv88e6xxx_model {
 	MV88E6190,
 	MV88E6190X,
 	MV88E6191,
+	MV88E6220,
 	MV88E6240,
 	MV88E6250,
 	MV88E6290,
@@ -77,7 +78,7 @@ enum mv88e6xxx_family {
 	MV88E6XXX_FAMILY_6097,	/* 6046 6085 6096 6097 */
 	MV88E6XXX_FAMILY_6165,	/* 6123 6161 6165 */
 	MV88E6XXX_FAMILY_6185,	/* 6108 6121 6122 6131 6152 6155 6182 6185 */
-	MV88E6XXX_FAMILY_6250,	/* 6250 */
+	MV88E6XXX_FAMILY_6250,	/* 6220 6250 */
 	MV88E6XXX_FAMILY_6320,	/* 6320 6321 */
 	MV88E6XXX_FAMILY_6341,	/* 6141 6341 */
 	MV88E6XXX_FAMILY_6351,	/* 6171 6175 6350 6351 */
diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
index 8d5a6cd6fb19..ceec771f8bfc 100644
--- a/drivers/net/dsa/mv88e6xxx/port.h
+++ b/drivers/net/dsa/mv88e6xxx/port.h
@@ -117,6 +117,7 @@
 #define MV88E6XXX_PORT_SWITCH_ID_PROD_6190	0x1900
 #define MV88E6XXX_PORT_SWITCH_ID_PROD_6191	0x1910
 #define MV88E6XXX_PORT_SWITCH_ID_PROD_6185	0x1a70
+#define MV88E6XXX_PORT_SWITCH_ID_PROD_6220	0x2200
 #define MV88E6XXX_PORT_SWITCH_ID_PROD_6240	0x2400
 #define MV88E6XXX_PORT_SWITCH_ID_PROD_6250	0x2500
 #define MV88E6XXX_PORT_SWITCH_ID_PROD_6290	0x2900
-- 
2.22.0


^ permalink raw reply related

* [PATCH net-next v2 2/6] dt-bindings: net: dsa: marvell: add 6220 model to the 6250 family
From: Hubert Feurstein @ 2019-07-31  8:23 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S. Miller, Rasmus Villemoes
In-Reply-To: <20190731082351.3157-1-h.feurstein@gmail.com>

The MV88E6220 is part of the MV88E6250 family.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 Documentation/devicetree/bindings/net/dsa/marvell.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/dsa/marvell.txt b/Documentation/devicetree/bindings/net/dsa/marvell.txt
index 6f9538974bb9..30c11fea491b 100644
--- a/Documentation/devicetree/bindings/net/dsa/marvell.txt
+++ b/Documentation/devicetree/bindings/net/dsa/marvell.txt
@@ -22,7 +22,7 @@ which is at a different MDIO base address in different switch families.
 - "marvell,mv88e6190"	: Switch has base address 0x00. Use with models:
 			  6190, 6190X, 6191, 6290, 6390, 6390X
 - "marvell,mv88e6250"	: Switch has base address 0x08 or 0x18. Use with model:
-			  6250
+			  6220, 6250
 
 Required properties:
 - compatible		: Should be one of "marvell,mv88e6085",
-- 
2.22.0


^ permalink raw reply related

* KASAN: invalid-free in tls_sk_proto_cleanup
From: syzbot @ 2019-07-31  8:29 UTC (permalink / raw)
  To: aviadye, borisp, daniel, davejwatson, davem, jakub.kicinski,
	john.fastabend, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    fde50b96 Add linux-next specific files for 20190726
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15ea7f3fa00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4b58274564b354c1
dashboard link: https://syzkaller.appspot.com/bug?extid=f5731e2256eb5130dbd6
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+f5731e2256eb5130dbd6@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: double-free or invalid-free in tls_sk_proto_cleanup+0x216/0x3e0  
net/tls/tls_main.c:300

CPU: 0 PID: 19107 Comm: syz-executor.4 Not tainted 5.3.0-rc1-next-20190726  
#53
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351
  kasan_report_invalid_free+0x65/0xa0 mm/kasan/report.c:444
  __kasan_slab_free+0x13a/0x150 mm/kasan/common.c:427
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:456
  __cache_free mm/slab.c:3425 [inline]
  kfree+0x10a/0x2c0 mm/slab.c:3756
  tls_sk_proto_cleanup+0x216/0x3e0 net/tls/tls_main.c:300
  tls_sk_proto_unhash+0x90/0x3f0 net/tls/tls_main.c:330
  tcp_set_state+0x5b9/0x7d0 net/ipv4/tcp.c:2235
  tcp_done+0xe2/0x320 net/ipv4/tcp.c:3824
  tcp_reset+0x132/0x500 net/ipv4/tcp_input.c:4080
  tcp_validate_incoming+0xa2d/0x1660 net/ipv4/tcp_input.c:5440
  tcp_rcv_established+0x6b5/0x1e70 net/ipv4/tcp_input.c:5648
  tcp_v6_do_rcv+0x41e/0x12c0 net/ipv6/tcp_ipv6.c:1356
  sk_backlog_rcv include/net/sock.h:945 [inline]
  __release_sock+0x129/0x390 net/core/sock.c:2418
  release_sock+0x59/0x1c0 net/core/sock.c:2934
  sk_stream_wait_memory+0x65a/0xfc0 net/core/stream.c:149
  tls_sw_sendmsg+0x673/0x17b0 net/tls/tls_sw.c:1054
  inet6_sendmsg+0x9e/0xe0 net/ipv6/af_inet6.c:576
  sock_sendmsg_nosec net/socket.c:637 [inline]
  sock_sendmsg+0xd7/0x130 net/socket.c:657
  __sys_sendto+0x262/0x380 net/socket.c:1952
  __do_sys_sendto net/socket.c:1964 [inline]
  __se_sys_sendto net/socket.c:1960 [inline]
  __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1960
  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459829
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fcee5e02c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000459829
RDX: ffffffffffffffc1 RSI: 00000000200005c0 RDI: 0000000000000003
RBP: 000000000075bf20 R08: 0000000000000000 R09: 1201000000003618
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fcee5e036d4
R13: 00000000004c77c1 R14: 00000000004dcf38 R15: 00000000ffffffff

Allocated by task 19107:
  save_stack+0x23/0x90 mm/kasan/common.c:69
  set_track mm/kasan/common.c:77 [inline]
  __kasan_kmalloc mm/kasan/common.c:486 [inline]
  __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:459
  kasan_kmalloc+0x9/0x10 mm/kasan/common.c:500
  __do_kmalloc mm/slab.c:3655 [inline]
  __kmalloc_track_caller+0x15f/0x760 mm/slab.c:3670
  kmemdup+0x27/0x60 mm/util.c:120
  kmemdup include/linux/string.h:477 [inline]
  tls_set_sw_offload+0xb3a/0x1567 net/tls/tls_sw.c:2361
  do_tls_setsockopt_conf net/tls/tls_main.c:581 [inline]
  do_tls_setsockopt net/tls/tls_main.c:630 [inline]
  tls_setsockopt+0x4d5/0x8d0 net/tls/tls_main.c:649
  sock_common_setsockopt+0x94/0xd0 net/core/sock.c:3130
  __sys_setsockopt+0x261/0x4c0 net/socket.c:2084
  __do_sys_setsockopt net/socket.c:2100 [inline]
  __se_sys_setsockopt net/socket.c:2097 [inline]
  __x64_sys_setsockopt+0xbe/0x150 net/socket.c:2097
  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 19107:
  save_stack+0x23/0x90 mm/kasan/common.c:69
  set_track mm/kasan/common.c:77 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/common.c:448
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:456
  __cache_free mm/slab.c:3425 [inline]
  kfree+0x10a/0x2c0 mm/slab.c:3756
  tls_sk_proto_cleanup+0x216/0x3e0 net/tls/tls_main.c:300
  tls_sk_proto_unhash+0x90/0x3f0 net/tls/tls_main.c:330
  tcp_set_state+0x5b9/0x7d0 net/ipv4/tcp.c:2235
  tcp_done+0xe2/0x320 net/ipv4/tcp.c:3824
  tcp_reset+0x132/0x500 net/ipv4/tcp_input.c:4080
  tcp_validate_incoming+0xa2d/0x1660 net/ipv4/tcp_input.c:5440
  tcp_rcv_established+0x6b5/0x1e70 net/ipv4/tcp_input.c:5648
  tcp_v6_do_rcv+0x41e/0x12c0 net/ipv6/tcp_ipv6.c:1356
  sk_backlog_rcv include/net/sock.h:945 [inline]
  __release_sock+0x129/0x390 net/core/sock.c:2418
  release_sock+0x59/0x1c0 net/core/sock.c:2934
  wait_on_pending_writer+0x20f/0x420 net/tls/tls_main.c:91
  tls_sk_proto_cleanup+0x2c5/0x3e0 net/tls/tls_main.c:295
  tls_sk_proto_unhash+0x90/0x3f0 net/tls/tls_main.c:330
  tcp_set_state+0x5b9/0x7d0 net/ipv4/tcp.c:2235
  tcp_done+0xe2/0x320 net/ipv4/tcp.c:3824
  tcp_reset+0x132/0x500 net/ipv4/tcp_input.c:4080
  tcp_validate_incoming+0xa2d/0x1660 net/ipv4/tcp_input.c:5440
  tcp_rcv_established+0x6b5/0x1e70 net/ipv4/tcp_input.c:5648
  tcp_v6_do_rcv+0x41e/0x12c0 net/ipv6/tcp_ipv6.c:1356
  sk_backlog_rcv include/net/sock.h:945 [inline]
  __release_sock+0x129/0x390 net/core/sock.c:2418
  release_sock+0x59/0x1c0 net/core/sock.c:2934
  sk_stream_wait_memory+0x65a/0xfc0 net/core/stream.c:149
  tls_sw_sendmsg+0x673/0x17b0 net/tls/tls_sw.c:1054
  inet6_sendmsg+0x9e/0xe0 net/ipv6/af_inet6.c:576
  sock_sendmsg_nosec net/socket.c:637 [inline]
  sock_sendmsg+0xd7/0x130 net/socket.c:657
  __sys_sendto+0x262/0x380 net/socket.c:1952
  __do_sys_sendto net/socket.c:1964 [inline]
  __se_sys_sendto net/socket.c:1960 [inline]
  __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1960
  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff88808ff1c500
  which belongs to the cache kmalloc-32 of size 32
The buggy address is located 0 bytes inside of
  32-byte region [ffff88808ff1c500, ffff88808ff1c520)
The buggy address belongs to the page:
page:ffffea00023fc700 refcount:1 mapcount:0 mapping:ffff8880aa4001c0  
index:0xffff88808ff1cfc1
flags: 0x1fffc0000000200(slab)
raw: 01fffc0000000200 ffffea0002a12308 ffffea0002912208 ffff8880aa4001c0
raw: ffff88808ff1cfc1 ffff88808ff1c000 0000000100000024 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff88808ff1c400: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
  ffff88808ff1c480: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
> ffff88808ff1c500: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
                    ^
  ffff88808ff1c580: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
  ffff88808ff1c600: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* Re: [PATCH v2 bpf-next 02/12] libbpf: implement BPF CO-RE offset relocation algorithm
From: Song Liu @ 2019-07-31  8:29 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, netdev@vger.kernel.org, Alexei Starovoitov,
	daniel@iogearbox.net, Yonghong Song, Kernel Team
In-Reply-To: <CAEf4Bza3cAoZJE+24_MBiv-8yYtAaTkAez5xq1v12cLW1-RGcw@mail.gmail.com>



> On Jul 30, 2019, at 11:52 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Tue, Jul 30, 2019 at 10:19 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Jul 30, 2019, at 6:00 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>> 
>>> On Tue, Jul 30, 2019 at 5:39 PM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Jul 30, 2019, at 12:53 PM, Andrii Nakryiko <andriin@fb.com> wrote:
>>>>> 
>>>>> This patch implements the core logic for BPF CO-RE offsets relocations.
>>>>> Every instruction that needs to be relocated has corresponding
>>>>> bpf_offset_reloc as part of BTF.ext. Relocations are performed by trying
>>>>> to match recorded "local" relocation spec against potentially many
>>>>> compatible "target" types, creating corresponding spec. Details of the
>>>>> algorithm are noted in corresponding comments in the code.
>>>>> 
>>>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

[...]

>>>> 
>>> 
>>> I just picked the most succinct and non-repetitive form. It's
>>> immediately apparent which type it's implicitly converted to, so I
>>> felt there is no need to repeat it. Also, just (void *) is much
>>> shorter. :)
>> 
>> _All_ other code in btf.c converts the pointer to the target type.
> 
> Most in libbpf.c doesn't, though. Also, I try to preserve pointer
> constness for uses that don't modify BTF types (pretty much all of
> them in libbpf), so it becomes really verbose, despite extremely short
> variable names:
> 
> const struct btf_member *m = (const struct btf_member *)(t + 1);

I don't think being verbose is a big problem here. Overusing 
(void *) feels like a bigger problem. 

> 
> Add one or two levels of nestedness and you are wrapping this line.
> 
>> In some cases, it is not apparent which type it is converted to,
>> for example:
>> 
>> +       m = (void *)(targ_type + 1);
>> 
>> I would suggest we do implicit conversion whenever possible.
> 
> Implicit conversion (`m = targ_type + 1;`) is a compilation error,
> that won't work.

I misused "implicit" here. I actually meant to say

	m = ((const struct btf_member *)(t + 1);




^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox