* Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc
From: Jesus Sanchez-Palencia @ 2018-03-24 0:34 UTC (permalink / raw)
To: Thomas Gleixner
Cc: netdev, jhs, xiyou.wangcong, jiri, vinicius.gomes, richardcochran,
anna-maria, henrik, John Stultz, levi.pearson, edumazet, willemb,
mlichvar
In-Reply-To: <alpine.DEB.2.21.1803222312061.1489@nanos.tec.linutronix.de>
Hi,
On 03/22/2018 03:52 PM, Thomas Gleixner wrote:
> On Thu, 22 Mar 2018, Jesus Sanchez-Palencia wrote:
>> Our plan was to work directly with the Qbv-like scheduling (per-port) just after
>> the cbs qdisc (Qav), but the feedback here and offline was that there were use
>> cases for a more simplistic launchtime approach (per-queue) as well. We've
>> decided to invest on it first (and postpone the 'taprio' qdisc until there was
>> NIC available with HW support for it, basically).
>
> I missed that discussion due to other urgent stuff on my plate. Just
> skimmed through it. More below.
>
>> You are right, and we agree, that using tbs for a per-port schedule of any sort
>> will require a SW scheduler to be developed on top of it, but we've never said
>> the contrary either. Our vision has always been that these are separate
>> mechanisms with different use-cases, so we do see the value for the kernel to
>> provide both.
>>
>> In other words, tbs is not the final solution for Qbv, and we agree that a 'TAS'
>> qdisc is still necessary. And due to the wide range of applications and hw being
>> used for those out there, we need both specially given that one does not block
>> the other.
>
> So what's the plan for this? Having TAS as a separate entity or TAS feeding
> into the proposed 'basic' time transmission thing?
The second one, I guess. Elaborating, the plan is at some point having TAS as a
separate entity, but which can use tbs for one of its classes (and cbs for
another, and strict priority for everything else, etc).
Basically, the design would something along the lines of 'taprio'. A root qdisc
that is both time and priority aware, and capable of running a schedule for the
port. That schedule can run inside the kernel with hrtimers, or just be
offloaded into the controller if Qbv is supported on HW.
Because it would expose the inner traffic classes in a mq / mqprio / prio style,
then it would allow for other per-queue qdiscs to be attached to it. On a system
using the i210, for instance, we could then have tbs installed on traffic class
0 just dialing hw offload. The Qbv schedule would be running in SW on the TAS
entity (i.e. 'taprio') which would be setting the packets' txtime before
dequeueing packets on a fast path -> tbs -> NIC.
Similarly, other qdisc, like cbs, could be installed if all that traffic class
requires is traffic shaping once its 'gate' is allowed to execute the selected
tx algorithm attached to it.
>
> The general objection I have with the current approach is that it creates
> the playground for all flavours of misdesigned user space implementations
> and just replaces the home brewn and ugly user mode network adapter
> drivers.
>
> But that's not helping the cause at all. There is enough crappy stuff out
> there already and I rather see a proper designed slice management which can
> be utilized and improved by all involved parties.
>
> All variants which utilize the basic time driven packet transmission are
> based on periodic explicit plan scheduling with (local) network wide time
> slice assignment.
>
> It does not matter whether you feed VLAN traffic into a time slice, where
> the VLAN itself does not even have to know about it, or if you have aware
> applications feeding packets to a designated timeslot. The basic principle
> of this is always the same.
>
> So coming back to last years discussion. It totally went into the wrong
> direction because it turned from an approach (the patches) which came from
> the big picture to an single use case and application centric view. That's
> just wrong and I regret that I didn't have the time to pay attention back
> then.
>
> You always need to look at the big picture first and design from there, not
> the other way round. There will always be the argument:
>
> But my application is special and needs X
>
> It's easy to fall for that. From a long experience I know that none of
> these claims ever held. These arguments are made because the people making
> them have either never looked at the big picture or are simply refusing to
> do so because it would cause them work.
>
> If you start from the use case and application centric view and ignore the
> big picture then you end up in a gazillion of extra magic features over
> time which could have been completely avoided if you had put your foot down
> and made everyone to agree on a proper and versatile design in the first
> place.
>
> The more low level access you hand out in the beginning the less commonly
> used, improved and maintained infrastrucure you will get in the end. That
> has happened before in other areas and it will happen here as well. You
> create a user space ABI which you cant get rid off and before you come out
> with the proper interface after that a large number of involved parties
> have gone off and implemented on top of the low level ABI and they will
> never look back.
>
> In the (not so) long run this will create a lot more issues than it
> solves. A simple example is that you cannot run two applications which
> easily could share the network in parallel without major surgery because
> both require to be the management authority.
>
> I've not yet seen a convincing argument why this low level stuff with all
> of its weird flavours is superiour over something which reflects the basic
> operating principle of TSN.
As you know, not all TSN systems are designed the same. Take AVB systems, for
example. These not always are running on networks that are aware of any time
schedule, or at least not quite like what is described by Qbv.
On those systems there is usually a certain number of streams with different
priorities that care mostly about having their bandwidth reserved along the
network. The applications running on such systems are usually based on AVTP,
thus they already have to calculate and set the "avtp presentation time"
per-packet themselves. A Qbv scheduler would probably provide very little
benefits to this domain, IMHO. For "talkers" of these AVB systems, shaping
traffic using txtime (i.e. tbs) can provide a low-jitter alternative to cbs, for
instance.
Thanks,
Jesus
>
> Thanks,
>
> tglx
>
>
>
>
>
>
>
>
^ permalink raw reply
* [PATCH net-next 09/13] liquidio: Removed one line function wake_q
From: Felix Manlunas @ 2018-03-24 0:37 UTC (permalink / raw)
To: davem
Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
intiyaz.basha
In-Reply-To: <20180324003618.GA6457@felix-thinkpad.cavium.com>
From: Intiyaz Basha <intiyaz.basha@cavium.com>
Removing one line function wake_q
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
---
drivers/net/ethernet/cavium/liquidio/lio_main.c | 14 ++------------
drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 12 +-----------
2 files changed, 3 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index be16a1c..78f6794 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -509,16 +509,6 @@ static void liquidio_deinit_pci(void)
}
/**
- * \brief Wake a queue
- * @param netdev network device
- * @param q which queue to wake
- */
-static inline void wake_q(struct net_device *netdev, int q)
-{
- netif_wake_subqueue(netdev, q);
-}
-
-/**
* \brief Check Tx queue status, and take appropriate action
* @param lio per-network private data
* @returns 0 if full, number of queues woken up otherwise
@@ -536,7 +526,7 @@ static inline int check_txq_status(struct lio *lio)
if (octnet_iq_is_full(lio->oct_dev, iq))
continue;
if (__netif_subqueue_stopped(lio->netdev, q)) {
- wake_q(lio->netdev, q);
+ netif_wake_subqueue(lio->netdev, q);
INCR_INSTRQUEUE_PKT_COUNT(lio->oct_dev, iq,
tx_restart, 1);
ret_val++;
@@ -1656,7 +1646,7 @@ static inline int check_txq_state(struct lio *lio, struct sk_buff *skb)
if (__netif_subqueue_stopped(lio->netdev, q)) {
INCR_INSTRQUEUE_PKT_COUNT(lio->oct_dev, iq, tx_restart, 1);
- wake_q(lio->netdev, q);
+ netif_wake_subqueue(lio->netdev, q);
}
return 1;
}
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 3120aed..5ab0831 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -285,16 +285,6 @@ static pci_ers_result_t liquidio_pcie_error_detected(struct pci_dev *pdev,
};
/**
- * \brief Wake a queue
- * @param netdev network device
- * @param q which queue to wake
- */
-static void wake_q(struct net_device *netdev, int q)
-{
- netif_wake_subqueue(netdev, q);
-}
-
-/**
* Remove the node at the head of the list. The list would be empty at
* the end of this call if there are no more nodes in the list.
*/
@@ -980,7 +970,7 @@ static int check_txq_state(struct lio *lio, struct sk_buff *skb)
if (__netif_subqueue_stopped(lio->netdev, q)) {
INCR_INSTRQUEUE_PKT_COUNT(lio->oct_dev, iq, tx_restart, 1);
- wake_q(lio->netdev, q);
+ netif_wake_subqueue(lio->netdev, q);
}
return 1;
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 10/13] liquidio: Function call skb_iq for deriving queue from skb
From: Felix Manlunas @ 2018-03-24 0:37 UTC (permalink / raw)
To: davem
Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
intiyaz.basha
In-Reply-To: <20180324003618.GA6457@felix-thinkpad.cavium.com>
From: Intiyaz Basha <intiyaz.basha@cavium.com>
Using skb_iq function for deriving queue from skb
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
---
drivers/net/ethernet/cavium/liquidio/lio_main.c | 3 +--
drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 3 +--
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 78f6794..2558a94 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -2528,8 +2528,7 @@ static int liquidio_xmit(struct sk_buff *skb, struct net_device *netdev)
lio = GET_LIO(netdev);
oct = lio->oct_dev;
- q_idx = skb->queue_mapping;
- q_idx = (q_idx % (lio->linfo.num_txpciq));
+ q_idx = skb_iq(lio, skb);
tag = q_idx;
iq_no = lio->linfo.txpciq[q_idx].s.q_no;
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 5ab0831..478c20a 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -1604,8 +1604,7 @@ static int liquidio_xmit(struct sk_buff *skb, struct net_device *netdev)
lio = GET_LIO(netdev);
oct = lio->oct_dev;
- q_idx = skb->queue_mapping;
- q_idx = (q_idx % (lio->linfo.num_txpciq));
+ q_idx = skb_iq(lio, skb);
tag = q_idx;
iq_no = lio->linfo.txpciq[q_idx].s.q_no;
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 11/13] liquidio: Renamed txqs_wake to wake_txqs
From: Felix Manlunas @ 2018-03-24 0:37 UTC (permalink / raw)
To: davem
Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
intiyaz.basha
In-Reply-To: <20180324003618.GA6457@felix-thinkpad.cavium.com>
From: Intiyaz Basha <intiyaz.basha@cavium.com>
For consistency renaming txqs_wake to wake_txqs
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
---
drivers/net/ethernet/cavium/liquidio/lio_main.c | 4 ++--
drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 4 ++--
drivers/net/ethernet/cavium/liquidio/octeon_network.h | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 2558a94..8b0a080 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -777,7 +777,7 @@ static inline void update_link_status(struct net_device *netdev,
if (lio->linfo.link.s.link_up) {
dev_dbg(&oct->pci_dev->dev, "%s: link_up", __func__);
netif_carrier_on(netdev);
- txqs_wake(netdev);
+ wake_txqs(netdev);
} else {
dev_dbg(&oct->pci_dev->dev, "%s: link_off", __func__);
netif_carrier_off(netdev);
@@ -2763,7 +2763,7 @@ static void liquidio_tx_timeout(struct net_device *netdev)
"Transmit timeout tx_dropped:%ld, waking up queues now!!\n",
netdev->stats.tx_dropped);
netif_trans_update(netdev);
- txqs_wake(netdev);
+ wake_txqs(netdev);
}
static int liquidio_vlan_rx_add_vid(struct net_device *netdev,
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 478c20a..288096b 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -515,7 +515,7 @@ static void update_link_status(struct net_device *netdev,
if (lio->linfo.link.s.link_up) {
netif_carrier_on(netdev);
- txqs_wake(netdev);
+ wake_txqs(netdev);
} else {
netif_carrier_off(netdev);
txqs_stop(netdev);
@@ -1822,7 +1822,7 @@ static void liquidio_tx_timeout(struct net_device *netdev)
"Transmit timeout tx_dropped:%ld, waking up queues now!!\n",
netdev->stats.tx_dropped);
netif_trans_update(netdev);
- txqs_wake(netdev);
+ wake_txqs(netdev);
}
static int
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_network.h b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
index 7922a69..3cbc65a 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_network.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
@@ -522,7 +522,7 @@ static inline void txqs_stop(struct net_device *netdev)
* \brief Wake Tx queues
* @param netdev network device
*/
-static inline void txqs_wake(struct net_device *netdev)
+static inline void wake_txqs(struct net_device *netdev)
{
struct lio *lio = GET_LIO(netdev);
int i, qno;
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 12/13] liquidio: Renamed txqs_stop to stop_txqs
From: Felix Manlunas @ 2018-03-24 0:37 UTC (permalink / raw)
To: davem
Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
intiyaz.basha
In-Reply-To: <20180324003618.GA6457@felix-thinkpad.cavium.com>
From: Intiyaz Basha <intiyaz.basha@cavium.com>
For consistency renaming txqs_stop to stop_txqs
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
---
drivers/net/ethernet/cavium/liquidio/lio_main.c | 2 +-
drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 4 ++--
drivers/net/ethernet/cavium/liquidio/octeon_network.h | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 8b0a080..54fd315 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -781,7 +781,7 @@ static inline void update_link_status(struct net_device *netdev,
} else {
dev_dbg(&oct->pci_dev->dev, "%s: link_off", __func__);
netif_carrier_off(netdev);
- txqs_stop(netdev);
+ stop_txqs(netdev);
}
if (lio->linfo.link.s.mtu != current_max_mtu) {
netif_info(lio, probe, lio->netdev, "Max MTU changed from %d to %d\n",
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 288096b..4d7a0ae 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -518,7 +518,7 @@ static void update_link_status(struct net_device *netdev,
wake_txqs(netdev);
} else {
netif_carrier_off(netdev);
- txqs_stop(netdev);
+ stop_txqs(netdev);
}
if (lio->linfo.link.s.mtu != current_max_mtu) {
@@ -1186,7 +1186,7 @@ static int liquidio_stop(struct net_device *netdev)
ifstate_reset(lio, LIO_IFSTATE_RUNNING);
- txqs_stop(netdev);
+ stop_txqs(netdev);
dev_info(&oct->pci_dev->dev, "%s interface is stopped\n", netdev->name);
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_network.h b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
index 3cbc65a..1b4c85a 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_network.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
@@ -510,7 +510,7 @@ static inline int wait_for_pending_requests(struct octeon_device *oct)
* \brief Stop Tx queues
* @param netdev network device
*/
-static inline void txqs_stop(struct net_device *netdev)
+static inline void stop_txqs(struct net_device *netdev)
{
int i;
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 13/13] liquidio: Renamed txqs_start to start_txqs
From: Felix Manlunas @ 2018-03-24 0:37 UTC (permalink / raw)
To: davem
Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
intiyaz.basha
In-Reply-To: <20180324003618.GA6457@felix-thinkpad.cavium.com>
From: Intiyaz Basha <intiyaz.basha@cavium.com>
For consistency renaming txqs_start to start_txqs
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
---
drivers/net/ethernet/cavium/liquidio/lio_main.c | 2 +-
drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 2 +-
drivers/net/ethernet/cavium/liquidio/octeon_network.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 54fd315..ba3ca02 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -2086,7 +2086,7 @@ static int liquidio_open(struct net_device *netdev)
return -1;
}
- txqs_start(netdev);
+ start_txqs(netdev);
/* tell Octeon to start forwarding packets to host */
send_rx_ctrl_cmd(lio, 1);
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 4d7a0ae..d5f5c9a 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -1144,7 +1144,7 @@ static int liquidio_open(struct net_device *netdev)
lio->intf_open = 1;
netif_info(lio, ifup, lio->netdev, "Interface Open, ready for traffic\n");
- txqs_start(netdev);
+ start_txqs(netdev);
/* tell Octeon to start forwarding packets to host */
send_rx_ctrl_cmd(lio, 1);
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_network.h b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
index 1b4c85a..8782206 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_network.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
@@ -542,7 +542,7 @@ static inline void wake_txqs(struct net_device *netdev)
* \brief Start Tx queues
* @param netdev network device
*/
-static inline void txqs_start(struct net_device *netdev)
+static inline void start_txqs(struct net_device *netdev)
{
struct lio *lio = GET_LIO(netdev);
int i;
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v2 bpf-next 5/8] bpf: introduce BPF_RAW_TRACEPOINT
From: Alexei Starovoitov @ 2018-03-24 0:58 UTC (permalink / raw)
To: Daniel Borkmann, davem
Cc: torvalds, peterz, rostedt, netdev, kernel-team, linux-api
In-Reply-To: <eb46ee44-3010-4c4e-1020-9b4fbdd34101@iogearbox.net>
On 3/23/18 4:13 PM, Daniel Borkmann wrote:
> On 03/22/2018 04:41 PM, Alexei Starovoitov wrote:
>> On 3/22/18 2:43 AM, Daniel Borkmann wrote:
>>> On 03/21/2018 07:54 PM, Alexei Starovoitov wrote:
>>> [...]
>>>> @@ -546,6 +556,53 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
>>>> void perf_trace_buf_update(void *record, u16 type);
>>>> void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
>>>>
>>>> +void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
>>>> +void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
>>>> +void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3);
>>>> +void bpf_trace_run4(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4);
>>>> +void bpf_trace_run5(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5);
>>>> +void bpf_trace_run6(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6);
>>>> +void bpf_trace_run7(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7);
>>>> +void bpf_trace_run8(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8);
>>>> +void bpf_trace_run9(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9);
>>>> +void bpf_trace_run10(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9, u64 arg10);
>>>> +void bpf_trace_run11(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11);
>>>> +void bpf_trace_run12(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12);
>>>> +void bpf_trace_run13(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12,
>>>> + u64 arg13);
>>>> +void bpf_trace_run14(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12,
>>>> + u64 arg13, u64 arg14);
>>>> +void bpf_trace_run15(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12,
>>>> + u64 arg13, u64 arg14, u64 arg15);
>>>> +void bpf_trace_run16(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12,
>>>> + u64 arg13, u64 arg14, u64 arg15, u64 arg16);
>>>> +void bpf_trace_run17(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12,
>>>> + u64 arg13, u64 arg14, u64 arg15, u64 arg16, u64 arg17);
>>>> void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx,
>>>> struct trace_event_call *call, u64 count,
>>>> struct pt_regs *regs, struct hlist_head *head,
>>> [...]
>>>> @@ -896,3 +976,206 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info)
>>>>
>>>> return ret;
>>>> }
>>>> +
>>>> +static __always_inline
>>>> +void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
>>>> +{
>>>> + rcu_read_lock();
>>>> + preempt_disable();
>>>> + (void) BPF_PROG_RUN(prog, args);
>>>> + preempt_enable();
>>>> + rcu_read_unlock();
>>>> +}
>>>> +
>>>> +#define EVAL1(FN, X) FN(X)
>>>> +#define EVAL2(FN, X, Y...) FN(X) EVAL1(FN, Y)
>>>> +#define EVAL3(FN, X, Y...) FN(X) EVAL2(FN, Y)
>>>> +#define EVAL4(FN, X, Y...) FN(X) EVAL3(FN, Y)
>>>> +#define EVAL5(FN, X, Y...) FN(X) EVAL4(FN, Y)
>>>> +#define EVAL6(FN, X, Y...) FN(X) EVAL5(FN, Y)
>>>> +
>>>> +#define COPY(X) args[X - 1] = arg##X;
>>>> +
>>>> +void bpf_trace_run1(struct bpf_prog *prog, u64 arg1)
>>>> +{
>>>> + u64 args[1];
>>>> +
>>>> + EVAL1(COPY, 1);
>>>> + __bpf_trace_run(prog, args);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run1);
>>>> +void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2)
>>>> +{
>>>> + u64 args[2];
>>>> +
>>>> + EVAL2(COPY, 1, 2);
>>>> + __bpf_trace_run(prog, args);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run2);
>>>> +void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3)
>>>> +{
>>>> + u64 args[3];
>>>> +
>>>> + EVAL3(COPY, 1, 2, 3);
>>>> + __bpf_trace_run(prog, args);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run3);
>>>> +void bpf_trace_run4(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4)
>>>> +{
>>>> + u64 args[4];
>>>> +
>>>> + EVAL4(COPY, 1, 2, 3, 4);
>>>> + __bpf_trace_run(prog, args);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run4);
>>>> +void bpf_trace_run5(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5)
>>>> +{
>>>> + u64 args[5];
>>>> +
>>>> + EVAL5(COPY, 1, 2, 3, 4, 5);
>>>> + __bpf_trace_run(prog, args);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run5);
>>>> +void bpf_trace_run6(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6)
>>>> +{
>>>> + u64 args[6];
>>>> +
>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>> + __bpf_trace_run(prog, args);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run6);
>>>> +void bpf_trace_run7(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7)
>>>> +{
>>>> + u64 args[7];
>>>> +
>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>> + EVAL1(COPY, 7);
>>>> + __bpf_trace_run(prog, args);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run7);
>>>> +void bpf_trace_run8(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8)
>>>> +{
>>>> + u64 args[8];
>>>> +
>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>> + EVAL2(COPY, 7, 8);
>>>> + __bpf_trace_run(prog, args);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run8);
>>>> +void bpf_trace_run9(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9)
>>>> +{
>>>> + u64 args[9];
>>>> +
>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>> + EVAL3(COPY, 7, 8, 9);
>>>> + __bpf_trace_run(prog, args);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run9);
>>>> +void bpf_trace_run10(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9, u64 arg10)
>>>> +{
>>>> + u64 args[10];
>>>> +
>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>> + EVAL4(COPY, 7, 8, 9, 10);
>>>> + __bpf_trace_run(prog, args);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run10);
>>>> +void bpf_trace_run11(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11)
>>>> +{
>>>> + u64 args[11];
>>>> +
>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>> + EVAL5(COPY, 7, 8, 9, 10, 11);
>>>> + __bpf_trace_run(prog, args);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run11);
>>>> +void bpf_trace_run12(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12)
>>>> +{
>>>> + u64 args[12];
>>>> +
>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>> + EVAL6(COPY, 7, 8, 9, 10, 11, 12);
>>>> + __bpf_trace_run(prog, args);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run12);
>>>> +void bpf_trace_run17(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12,
>>>> + u64 arg13, u64 arg14, u64 arg15, u64 arg16, u64 arg17)
>>>> +{
>>>> + u64 args[17];
>>>> +
>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>> + EVAL6(COPY, 7, 8, 9, 10, 11, 12);
>>>> + EVAL5(COPY, 13, 14, 15, 16, 17);
>>>> + __bpf_trace_run(prog, args);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run17);
>>>
>>> Would be nice if we could generate all these above via macro, e.g. when we define
>>> a hard upper limit for max number of tracepoint args anyway, so this gets automatically
>>> adjusted as well. Maybe some of the logic from BPF_CALL_*() macros could be borrowed
>>> for this purpose.
>>
>> I've thought about it, but couldn't figure out how to do it.
>> Suggestions are welcome.
>> The preprocessor cannot expand a constant N into N statements.
>> There gotta be something like:
>> ...
>> #define EVAL5(FN, X, Y...) FN(X) EVAL4(FN, Y)
>> #define EVAL6(FN, X, Y...) FN(X) EVAL5(FN, Y)
>> for whatever maximum we will pick.
>
> Right.
>
>> I picked 6 as a good compromise and used it twice in bpf_trace_run1x()
>> Similar thing possible for u64 arg1, u64 arg2, ...
>> but it will be harder to read.
>> Looking forward what you can come up with.
>
> Just took a quick look, so the below one would work for generating the
> signature and function. I did till 9 here:
>
> #define UNPACK(...) __VA_ARGS__
> #define REPEAT_1(FN, DL, X, ...) FN(X)
> #define REPEAT_2(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_1(FN, DL, __VA_ARGS__)
> #define REPEAT_3(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_2(FN, DL, __VA_ARGS__)
> #define REPEAT_4(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_3(FN, DL, __VA_ARGS__)
> #define REPEAT_5(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_4(FN, DL, __VA_ARGS__)
> #define REPEAT_6(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_5(FN, DL, __VA_ARGS__)
> #define REPEAT_7(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_6(FN, DL, __VA_ARGS__)
> #define REPEAT_8(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_7(FN, DL, __VA_ARGS__)
> #define REPEAT_9(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_8(FN, DL, __VA_ARGS__)
> #define REPEAT(X, FN, DL, ...) REPEAT_##X(FN, DL, __VA_ARGS__)
>
> #define SARG(X) u64 arg##X
> #define COPY(X) args[X] = arg##X
>
> #define __DL_COM (,)
> #define __DL_SEM (;)
>
> #define __SEQ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
>
> #define BPF_TRACE_DECL_x(x) \
> void bpf_trace_run##x(struct bpf_prog *prog, \
> REPEAT(x, SARG, __DL_COM, __SEQ))
> #define BPF_TRACE_DEFN_x(x) \
> void bpf_trace_run##x(struct bpf_prog *prog, \
> REPEAT(x, SARG, __DL_COM, __SEQ)) \
> { \
> u64 args[x]; \
> REPEAT(x, COPY, __DL_SEM, __SEQ); \
> __bpf_trace_run(prog, args); \
> } \
> EXPORT_SYMBOL_GPL(bpf_trace_run##x)
>
> So doing a ...
>
> BPF_TRACE_DECL_x(5);
> BPF_TRACE_DEFN_x(5);
interestingly that in addition to above defining
#define __REPEAT(X, FN, DL, ...) REPEAT_##X(FN, DL, __VA_ARGS__)
to allow recursive expansion and doing
__REPEAT(12, BPF_TRACE_DECL_x, __DL_SEM, __SEQ_1_12);
almost works...
I'm guessing it's hitting preprocessor internal limit on
number of expressions to expand.
It expands 1-6 nicely and 7-12 are partially expanded :)
I guess I have to use
BPF_TRACE_DECL_x(1);
BPF_TRACE_DECL_x(2);
BPF_TRACE_DECL_x(3);
BPF_TRACE_DECL_x(4);
...
BPF_TRACE_DECL_x(12);
which doesn't look better than open coding them.
Only for BPF_TRACE_DEFN_x it's probably worth it.
> ... will generate in kernel/trace/bpf_trace.i:
>
> void bpf_foo_trace_run5(struct bpf_prog *prog, u64 arg0 , u64 arg1 , u64 arg2 , u64 arg3 , u64 arg4);
> void bpf_foo_trace_run5(struct bpf_prog *prog, u64 arg0 , u64 arg1 , u64 arg2 , u64 arg3 , u64 arg4)
> {
> u64 args[5];
> args[0] = arg0 ;
> args[1] = arg1 ;
> args[2] = arg2 ;
> args[3] = arg3 ;
> args[4] = arg4;
> __bpf_trace_run(prog, args);
> } [...]
>
> Meaning, the EVALx() macros could be removed from there, too. Potentially, the
> REPEAT() macro could sit in its own include/linux/ header for others to reuse
> or such.
feels too specific for this use case. I'd wait second user before
moving to include/linux/kernel.h
^ permalink raw reply
* Re: [PATCH v2 bpf-next 5/8] bpf: introduce BPF_RAW_TRACEPOINT
From: Steven Rostedt @ 2018-03-24 1:39 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Alexei Starovoitov, davem, torvalds, peterz, netdev, kernel-team,
linux-api
In-Reply-To: <eb46ee44-3010-4c4e-1020-9b4fbdd34101@iogearbox.net>
On Sat, 24 Mar 2018 00:13:28 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:
> #define UNPACK(...) __VA_ARGS__
> #define REPEAT_1(FN, DL, X, ...) FN(X)
> #define REPEAT_2(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_1(FN, DL, __VA_ARGS__)
> #define REPEAT_3(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_2(FN, DL, __VA_ARGS__)
> #define REPEAT_4(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_3(FN, DL, __VA_ARGS__)
> #define REPEAT_5(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_4(FN, DL, __VA_ARGS__)
> #define REPEAT_6(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_5(FN, DL, __VA_ARGS__)
> #define REPEAT_7(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_6(FN, DL, __VA_ARGS__)
> #define REPEAT_8(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_7(FN, DL, __VA_ARGS__)
> #define REPEAT_9(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_8(FN, DL, __VA_ARGS__)
> #define REPEAT(X, FN, DL, ...) REPEAT_##X(FN, DL, __VA_ARGS__)
>
> #define SARG(X) u64 arg##X
> #define COPY(X) args[X] = arg##X
>
> #define __DL_COM (,)
> #define __DL_SEM (;)
>
> #define __SEQ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
>
> #define BPF_TRACE_DECL_x(x) \
> void bpf_trace_run##x(struct bpf_prog *prog, \
> REPEAT(x, SARG, __DL_COM, __SEQ))
> #define BPF_TRACE_DEFN_x(x) \
> void bpf_trace_run##x(struct bpf_prog *prog, \
> REPEAT(x, SARG, __DL_COM, __SEQ)) \
> { \
> u64 args[x]; \
> REPEAT(x, COPY, __DL_SEM, __SEQ); \
> __bpf_trace_run(prog, args); \
> } \
> EXPORT_SYMBOL_GPL(bpf_trace_run##x)
>
> So doing a ...
>
> BPF_TRACE_DECL_x(5);
> BPF_TRACE_DEFN_x(5);
>
> ... will generate in kernel/trace/bpf_trace.i:
>
> void bpf_foo_trace_run5(struct bpf_prog *prog, u64 arg0 , u64 arg1 , u64 arg2 , u64 arg3 , u64 arg4);
> void bpf_foo_trace_run5(struct bpf_prog *prog, u64 arg0 , u64 arg1 , u64 arg2 , u64 arg3 , u64 arg4)
> {
> u64 args[5];
> args[0] = arg0 ;
> args[1] = arg1 ;
> args[2] = arg2 ;
> args[3] = arg3 ;
> args[4] = arg4;
> __bpf_trace_run(prog, args);
> } [...]
>
> Meaning, the EVALx() macros could be removed from there, too. Potentially, the
> REPEAT() macro could sit in its own include/linux/ header for others to reuse
> or such.
And people think my macro magic in include/trace/ftrace_event.h is
funky. Now I know who stole my MACRO MAGIC HAT.
-- Steve
^ permalink raw reply
* Re: [PATCH v2 bpf-next 5/8] bpf: introduce BPF_RAW_TRACEPOINT
From: Alexei Starovoitov @ 2018-03-24 1:43 UTC (permalink / raw)
To: Daniel Borkmann, davem
Cc: torvalds, peterz, rostedt, netdev, kernel-team, linux-api
In-Reply-To: <50520d4a-1c07-9cca-068c-9bc737c7785f@fb.com>
On 3/23/18 5:58 PM, Alexei Starovoitov wrote:
> On 3/23/18 4:13 PM, Daniel Borkmann wrote:
>> On 03/22/2018 04:41 PM, Alexei Starovoitov wrote:
>>> On 3/22/18 2:43 AM, Daniel Borkmann wrote:
>>>> On 03/21/2018 07:54 PM, Alexei Starovoitov wrote:
>>>> [...]
>>>>> @@ -546,6 +556,53 @@ extern void ftrace_profile_free_filter(struct
>>>>> perf_event *event);
>>>>> void perf_trace_buf_update(void *record, u16 type);
>>>>> void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int
>>>>> *rctxp);
>>>>>
>>>>> +void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
>>>>> +void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
>>>>> +void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3);
>>>>> +void bpf_trace_run4(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4);
>>>>> +void bpf_trace_run5(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5);
>>>>> +void bpf_trace_run6(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6);
>>>>> +void bpf_trace_run7(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7);
>>>>> +void bpf_trace_run8(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8);
>>>>> +void bpf_trace_run9(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9);
>>>>> +void bpf_trace_run10(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9, u64 arg10);
>>>>> +void bpf_trace_run11(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11);
>>>>> +void bpf_trace_run12(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12);
>>>>> +void bpf_trace_run13(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12,
>>>>> + u64 arg13);
>>>>> +void bpf_trace_run14(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12,
>>>>> + u64 arg13, u64 arg14);
>>>>> +void bpf_trace_run15(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12,
>>>>> + u64 arg13, u64 arg14, u64 arg15);
>>>>> +void bpf_trace_run16(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12,
>>>>> + u64 arg13, u64 arg14, u64 arg15, u64 arg16);
>>>>> +void bpf_trace_run17(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12,
>>>>> + u64 arg13, u64 arg14, u64 arg15, u64 arg16, u64 arg17);
>>>>> void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx,
>>>>> struct trace_event_call *call, u64 count,
>>>>> struct pt_regs *regs, struct hlist_head *head,
>>>> [...]
>>>>> @@ -896,3 +976,206 @@ int perf_event_query_prog_array(struct
>>>>> perf_event *event, void __user *info)
>>>>>
>>>>> return ret;
>>>>> }
>>>>> +
>>>>> +static __always_inline
>>>>> +void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
>>>>> +{
>>>>> + rcu_read_lock();
>>>>> + preempt_disable();
>>>>> + (void) BPF_PROG_RUN(prog, args);
>>>>> + preempt_enable();
>>>>> + rcu_read_unlock();
>>>>> +}
>>>>> +
>>>>> +#define EVAL1(FN, X) FN(X)
>>>>> +#define EVAL2(FN, X, Y...) FN(X) EVAL1(FN, Y)
>>>>> +#define EVAL3(FN, X, Y...) FN(X) EVAL2(FN, Y)
>>>>> +#define EVAL4(FN, X, Y...) FN(X) EVAL3(FN, Y)
>>>>> +#define EVAL5(FN, X, Y...) FN(X) EVAL4(FN, Y)
>>>>> +#define EVAL6(FN, X, Y...) FN(X) EVAL5(FN, Y)
>>>>> +
>>>>> +#define COPY(X) args[X - 1] = arg##X;
>>>>> +
>>>>> +void bpf_trace_run1(struct bpf_prog *prog, u64 arg1)
>>>>> +{
>>>>> + u64 args[1];
>>>>> +
>>>>> + EVAL1(COPY, 1);
>>>>> + __bpf_trace_run(prog, args);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run1);
>>>>> +void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2)
>>>>> +{
>>>>> + u64 args[2];
>>>>> +
>>>>> + EVAL2(COPY, 1, 2);
>>>>> + __bpf_trace_run(prog, args);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run2);
>>>>> +void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3)
>>>>> +{
>>>>> + u64 args[3];
>>>>> +
>>>>> + EVAL3(COPY, 1, 2, 3);
>>>>> + __bpf_trace_run(prog, args);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run3);
>>>>> +void bpf_trace_run4(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4)
>>>>> +{
>>>>> + u64 args[4];
>>>>> +
>>>>> + EVAL4(COPY, 1, 2, 3, 4);
>>>>> + __bpf_trace_run(prog, args);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run4);
>>>>> +void bpf_trace_run5(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5)
>>>>> +{
>>>>> + u64 args[5];
>>>>> +
>>>>> + EVAL5(COPY, 1, 2, 3, 4, 5);
>>>>> + __bpf_trace_run(prog, args);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run5);
>>>>> +void bpf_trace_run6(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6)
>>>>> +{
>>>>> + u64 args[6];
>>>>> +
>>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>>> + __bpf_trace_run(prog, args);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run6);
>>>>> +void bpf_trace_run7(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7)
>>>>> +{
>>>>> + u64 args[7];
>>>>> +
>>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>>> + EVAL1(COPY, 7);
>>>>> + __bpf_trace_run(prog, args);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run7);
>>>>> +void bpf_trace_run8(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8)
>>>>> +{
>>>>> + u64 args[8];
>>>>> +
>>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>>> + EVAL2(COPY, 7, 8);
>>>>> + __bpf_trace_run(prog, args);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run8);
>>>>> +void bpf_trace_run9(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9)
>>>>> +{
>>>>> + u64 args[9];
>>>>> +
>>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>>> + EVAL3(COPY, 7, 8, 9);
>>>>> + __bpf_trace_run(prog, args);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run9);
>>>>> +void bpf_trace_run10(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9, u64 arg10)
>>>>> +{
>>>>> + u64 args[10];
>>>>> +
>>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>>> + EVAL4(COPY, 7, 8, 9, 10);
>>>>> + __bpf_trace_run(prog, args);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run10);
>>>>> +void bpf_trace_run11(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11)
>>>>> +{
>>>>> + u64 args[11];
>>>>> +
>>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>>> + EVAL5(COPY, 7, 8, 9, 10, 11);
>>>>> + __bpf_trace_run(prog, args);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run11);
>>>>> +void bpf_trace_run12(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12)
>>>>> +{
>>>>> + u64 args[12];
>>>>> +
>>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>>> + EVAL6(COPY, 7, 8, 9, 10, 11, 12);
>>>>> + __bpf_trace_run(prog, args);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run12);
>>>>> +void bpf_trace_run17(struct bpf_prog *prog, u64 arg1, u64 arg2,
>>>>> + u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>>>>> + u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12,
>>>>> + u64 arg13, u64 arg14, u64 arg15, u64 arg16, u64 arg17)
>>>>> +{
>>>>> + u64 args[17];
>>>>> +
>>>>> + EVAL6(COPY, 1, 2, 3, 4, 5, 6);
>>>>> + EVAL6(COPY, 7, 8, 9, 10, 11, 12);
>>>>> + EVAL5(COPY, 13, 14, 15, 16, 17);
>>>>> + __bpf_trace_run(prog, args);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(bpf_trace_run17);
>>>>
>>>> Would be nice if we could generate all these above via macro, e.g.
>>>> when we define
>>>> a hard upper limit for max number of tracepoint args anyway, so this
>>>> gets automatically
>>>> adjusted as well. Maybe some of the logic from BPF_CALL_*() macros
>>>> could be borrowed
>>>> for this purpose.
>>>
>>> I've thought about it, but couldn't figure out how to do it.
>>> Suggestions are welcome.
>>> The preprocessor cannot expand a constant N into N statements.
>>> There gotta be something like:
>>> ...
>>> #define EVAL5(FN, X, Y...) FN(X) EVAL4(FN, Y)
>>> #define EVAL6(FN, X, Y...) FN(X) EVAL5(FN, Y)
>>> for whatever maximum we will pick.
>>
>> Right.
>>
>>> I picked 6 as a good compromise and used it twice in bpf_trace_run1x()
>>> Similar thing possible for u64 arg1, u64 arg2, ...
>>> but it will be harder to read.
>>> Looking forward what you can come up with.
>>
>> Just took a quick look, so the below one would work for generating the
>> signature and function. I did till 9 here:
>>
>> #define UNPACK(...) __VA_ARGS__
>> #define REPEAT_1(FN, DL, X, ...) FN(X)
>> #define REPEAT_2(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_1(FN, DL,
>> __VA_ARGS__)
>> #define REPEAT_3(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_2(FN, DL,
>> __VA_ARGS__)
>> #define REPEAT_4(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_3(FN, DL,
>> __VA_ARGS__)
>> #define REPEAT_5(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_4(FN, DL,
>> __VA_ARGS__)
>> #define REPEAT_6(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_5(FN, DL,
>> __VA_ARGS__)
>> #define REPEAT_7(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_6(FN, DL,
>> __VA_ARGS__)
>> #define REPEAT_8(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_7(FN, DL,
>> __VA_ARGS__)
>> #define REPEAT_9(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_8(FN, DL,
>> __VA_ARGS__)
>> #define REPEAT(X, FN, DL, ...) REPEAT_##X(FN, DL, __VA_ARGS__)
>>
>> #define SARG(X) u64 arg##X
>> #define COPY(X) args[X] = arg##X
>>
>> #define __DL_COM (,)
>> #define __DL_SEM (;)
>>
>> #define __SEQ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
>>
>> #define BPF_TRACE_DECL_x(x) \
>> void bpf_trace_run##x(struct bpf_prog *prog, \
>> REPEAT(x, SARG, __DL_COM, __SEQ))
>> #define BPF_TRACE_DEFN_x(x) \
>> void bpf_trace_run##x(struct bpf_prog *prog, \
>> REPEAT(x, SARG, __DL_COM, __SEQ)) \
>> { \
>> u64 args[x]; \
>> REPEAT(x, COPY, __DL_SEM, __SEQ); \
>> __bpf_trace_run(prog, args); \
>> } \
>> EXPORT_SYMBOL_GPL(bpf_trace_run##x)
>>
>> So doing a ...
>>
>> BPF_TRACE_DECL_x(5);
>> BPF_TRACE_DEFN_x(5);
>
> interestingly that in addition to above defining
> #define __REPEAT(X, FN, DL, ...) REPEAT_##X(FN, DL, __VA_ARGS__)
> to allow recursive expansion and doing
> __REPEAT(12, BPF_TRACE_DECL_x, __DL_SEM, __SEQ_1_12);
> almost works...
> I'm guessing it's hitting preprocessor internal limit on
> number of expressions to expand.
> It expands 1-6 nicely and 7-12 are partially expanded :)
it's not the limit I'm hitting, but self referential issue.
Exactly half gets expanded.
I don't think there is an easy workaround other
than duplicating the whole chain of REPEAT macro twice
with slightly different name.
^ permalink raw reply
* Re: [PATCH net-next] mlxsw: spectrum_span: Prevent duplicate mirrors
From: David Miller @ 2018-03-24 1:51 UTC (permalink / raw)
To: idosch; +Cc: netdev, petrm, jiri, mlxsw
In-Reply-To: <20180323180358.30667-1-idosch@mellanox.com>
From: Ido Schimmel <idosch@mellanox.com>
Date: Fri, 23 Mar 2018 21:03:58 +0300
> In net commit 8175f7c4736f ("mlxsw: spectrum: Prevent duplicate
> mirrors") we prevented the user from mirroring more than once from a
> single binding point (port-direction pair).
>
> The fix was essentially reverted in a merge conflict resolution when net
> was merged into net-next. Restore it.
>
> Fixes: 03fe2debbb27 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
> Signed-off-by: Petr Machata <petrm@mellanox.com>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Applied, thanks for fixing this up.
^ permalink raw reply
* Re: [PATCH net-next] net/sched: remove tcf_idr_cleanup()
From: David Miller @ 2018-03-24 1:52 UTC (permalink / raw)
To: dcaratti; +Cc: xiyou.wangcong, netdev
In-Reply-To: <a513dacd0b91d9b5c4b78bbca8f5033fcf676892.1521828248.git.dcaratti@redhat.com>
From: Davide Caratti <dcaratti@redhat.com>
Date: Fri, 23 Mar 2018 19:09:39 +0100
> tcf_idr_cleanup() is no more used, so remove it.
>
> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH net-next] net/sched: act_vlan: declare push_vid with host byte order
From: David Miller @ 2018-03-24 1:54 UTC (permalink / raw)
To: dcaratti; +Cc: jiri, netdev
In-Reply-To: <65876774014962556e191e2a88e6eb02e0ece5a1.1521829106.git.dcaratti@redhat.com>
From: Davide Caratti <dcaratti@redhat.com>
Date: Fri, 23 Mar 2018 19:31:30 +0100
> use u16 in place of __be16 to suppress the following sparse warnings:
>
> net/sched/act_vlan.c:150:26: warning: incorrect type in assignment (different base types)
> net/sched/act_vlan.c:150:26: expected restricted __be16 [usertype] push_vid
> net/sched/act_vlan.c:150:26: got unsigned short
> net/sched/act_vlan.c:151:21: warning: restricted __be16 degrades to integer
> net/sched/act_vlan.c:208:26: warning: incorrect type in assignment (different base types)
> net/sched/act_vlan.c:208:26: expected unsigned short [unsigned] [usertype] tcfv_push_vid
> net/sched/act_vlan.c:208:26: got restricted __be16 [usertype] push_vid
>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Also applied, thanks Davide.
^ permalink raw reply
* Re: [PATCH v2 bpf-next 5/8] bpf: introduce BPF_RAW_TRACEPOINT
From: Linus Torvalds @ 2018-03-24 2:01 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Daniel Borkmann, David Miller, Peter Zijlstra, Steven Rostedt,
Network Development, kernel-team, Linux API
In-Reply-To: <cea19b0f-0420-0620-2240-954a81e22659@fb.com>
On Fri, Mar 23, 2018 at 6:43 PM, Alexei Starovoitov <ast@fb.com> wrote:
>
> it's not the limit I'm hitting, but self referential issue.
> Exactly half gets expanded.
> I don't think there is an easy workaround other
> than duplicating the whole chain of REPEAT macro twice
> with slightly different name.
Take a look at the __MAP() macro in include/linux/syscalls.h.
It basically takes a "transformation" as its argument, and does it <n>
times, where 'n' is the first argument (but could be self-counting).
Maybe it will give you some ideas.
... and maybe it will just drive you mad and make you gouge out your
eyes with a spoon. Don't blame the messenger.
Linus
^ permalink raw reply
* [PATCH v5 bpf-next 02/10] net/mediatek: disambiguate mt76 vs mt7601u trace events
From: Alexei Starovoitov @ 2018-03-24 2:30 UTC (permalink / raw)
To: davem; +Cc: daniel, torvalds, peterz, rostedt, netdev, kernel-team, linux-api
In-Reply-To: <20180324023038.938665-1-ast@fb.com>
From: Alexei Starovoitov <ast@kernel.org>
two trace events defined with the same name and both unused.
They conflict in allyesconfig build. Rename one of them.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
drivers/net/wireless/mediatek/mt7601u/trace.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/mediatek/mt7601u/trace.h b/drivers/net/wireless/mediatek/mt7601u/trace.h
index 289897300ef0..82c8898b9076 100644
--- a/drivers/net/wireless/mediatek/mt7601u/trace.h
+++ b/drivers/net/wireless/mediatek/mt7601u/trace.h
@@ -34,7 +34,7 @@
#define REG_PR_FMT "%04x=%08x"
#define REG_PR_ARG __entry->reg, __entry->val
-DECLARE_EVENT_CLASS(dev_reg_evt,
+DECLARE_EVENT_CLASS(dev_reg_evtu,
TP_PROTO(struct mt7601u_dev *dev, u32 reg, u32 val),
TP_ARGS(dev, reg, val),
TP_STRUCT__entry(
@@ -51,12 +51,12 @@ DECLARE_EVENT_CLASS(dev_reg_evt,
)
);
-DEFINE_EVENT(dev_reg_evt, reg_read,
+DEFINE_EVENT(dev_reg_evtu, reg_read,
TP_PROTO(struct mt7601u_dev *dev, u32 reg, u32 val),
TP_ARGS(dev, reg, val)
);
-DEFINE_EVENT(dev_reg_evt, reg_write,
+DEFINE_EVENT(dev_reg_evtu, reg_write,
TP_PROTO(struct mt7601u_dev *dev, u32 reg, u32 val),
TP_ARGS(dev, reg, val)
);
--
2.9.5
^ permalink raw reply related
* [PATCH v5 bpf-next 09/10] samples/bpf: raw tracepoint test
From: Alexei Starovoitov @ 2018-03-24 2:30 UTC (permalink / raw)
To: davem; +Cc: daniel, torvalds, peterz, rostedt, netdev, kernel-team, linux-api
In-Reply-To: <20180324023038.938665-1-ast@fb.com>
From: Alexei Starovoitov <ast@kernel.org>
add empty raw_tracepoint bpf program to test overhead similar
to kprobe and traditional tracepoint tests
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
samples/bpf/Makefile | 1 +
samples/bpf/bpf_load.c | 14 ++++++++++++++
samples/bpf/test_overhead_raw_tp_kern.c | 17 +++++++++++++++++
samples/bpf/test_overhead_user.c | 12 ++++++++++++
4 files changed, 44 insertions(+)
create mode 100644 samples/bpf/test_overhead_raw_tp_kern.c
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 2c2a587e0942..4d6a6edd4bf6 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -119,6 +119,7 @@ always += offwaketime_kern.o
always += spintest_kern.o
always += map_perf_test_kern.o
always += test_overhead_tp_kern.o
+always += test_overhead_raw_tp_kern.o
always += test_overhead_kprobe_kern.o
always += parse_varlen.o parse_simple.o parse_ldabs.o
always += test_cgrp2_tc_kern.o
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index b1a310c3ae89..bebe4188b4b3 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -61,6 +61,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
bool is_kprobe = strncmp(event, "kprobe/", 7) == 0;
bool is_kretprobe = strncmp(event, "kretprobe/", 10) == 0;
bool is_tracepoint = strncmp(event, "tracepoint/", 11) == 0;
+ bool is_raw_tracepoint = strncmp(event, "raw_tracepoint/", 15) == 0;
bool is_xdp = strncmp(event, "xdp", 3) == 0;
bool is_perf_event = strncmp(event, "perf_event", 10) == 0;
bool is_cgroup_skb = strncmp(event, "cgroup/skb", 10) == 0;
@@ -85,6 +86,8 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
prog_type = BPF_PROG_TYPE_KPROBE;
} else if (is_tracepoint) {
prog_type = BPF_PROG_TYPE_TRACEPOINT;
+ } else if (is_raw_tracepoint) {
+ prog_type = BPF_PROG_TYPE_RAW_TRACEPOINT;
} else if (is_xdp) {
prog_type = BPF_PROG_TYPE_XDP;
} else if (is_perf_event) {
@@ -131,6 +134,16 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
return populate_prog_array(event, fd);
}
+ if (is_raw_tracepoint) {
+ efd = bpf_raw_tracepoint_open(event + 15, fd);
+ if (efd < 0) {
+ printf("tracepoint %s %s\n", event + 15, strerror(errno));
+ return -1;
+ }
+ event_fd[prog_cnt - 1] = efd;
+ return 0;
+ }
+
if (is_kprobe || is_kretprobe) {
if (is_kprobe)
event += 7;
@@ -587,6 +600,7 @@ static int do_load_bpf_file(const char *path, fixup_map_cb fixup_map)
if (memcmp(shname, "kprobe/", 7) == 0 ||
memcmp(shname, "kretprobe/", 10) == 0 ||
memcmp(shname, "tracepoint/", 11) == 0 ||
+ memcmp(shname, "raw_tracepoint/", 15) == 0 ||
memcmp(shname, "xdp", 3) == 0 ||
memcmp(shname, "perf_event", 10) == 0 ||
memcmp(shname, "socket", 6) == 0 ||
diff --git a/samples/bpf/test_overhead_raw_tp_kern.c b/samples/bpf/test_overhead_raw_tp_kern.c
new file mode 100644
index 000000000000..d2af8bc1c805
--- /dev/null
+++ b/samples/bpf/test_overhead_raw_tp_kern.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Facebook */
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+SEC("raw_tracepoint/task_rename")
+int prog(struct bpf_raw_tracepoint_args *ctx)
+{
+ return 0;
+}
+
+SEC("raw_tracepoint/urandom_read")
+int prog2(struct bpf_raw_tracepoint_args *ctx)
+{
+ return 0;
+}
+char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/test_overhead_user.c b/samples/bpf/test_overhead_user.c
index d291167fd3c7..e1d35e07a10e 100644
--- a/samples/bpf/test_overhead_user.c
+++ b/samples/bpf/test_overhead_user.c
@@ -158,5 +158,17 @@ int main(int argc, char **argv)
unload_progs();
}
+ if (test_flags & 0xC0) {
+ snprintf(filename, sizeof(filename),
+ "%s_raw_tp_kern.o", argv[0]);
+ if (load_bpf_file(filename)) {
+ printf("%s", bpf_log_buf);
+ return 1;
+ }
+ printf("w/RAW_TRACEPOINT\n");
+ run_perf_test(num_cpu, test_flags >> 6);
+ unload_progs();
+ }
+
return 0;
}
--
2.9.5
^ permalink raw reply related
* [PATCH v5 bpf-next 05/10] macro: introduce COUNT_ARGS() macro
From: Alexei Starovoitov @ 2018-03-24 2:30 UTC (permalink / raw)
To: davem; +Cc: daniel, torvalds, peterz, rostedt, netdev, kernel-team, linux-api
In-Reply-To: <20180324023038.938665-1-ast@fb.com>
From: Alexei Starovoitov <ast@kernel.org>
move COUNT_ARGS() macro from apparmor to generic header and extend it
to count till twelve.
COUNT() was an alternative name for this logic, but it's used for
different purpose in many other places.
Similarly for CONCATENATE() macro.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
include/linux/kernel.h | 7 +++++++
security/apparmor/include/path.h | 7 +------
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..293fa0677fba 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -919,6 +919,13 @@ static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
#define swap(a, b) \
do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)
+/* This counts to 12. Any more, it will return 13th argument. */
+#define __COUNT_ARGS(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _n, X...) _n
+#define COUNT_ARGS(X...) __COUNT_ARGS(, ##X, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
+
+#define __CONCAT(a, b) a ## b
+#define CONCATENATE(a, b) __CONCAT(a, b)
+
/**
* container_of - cast a member of a structure out to the containing structure
* @ptr: the pointer to the member.
diff --git a/security/apparmor/include/path.h b/security/apparmor/include/path.h
index 05fb3305671e..e042b994f2b8 100644
--- a/security/apparmor/include/path.h
+++ b/security/apparmor/include/path.h
@@ -43,15 +43,10 @@ struct aa_buffers {
DECLARE_PER_CPU(struct aa_buffers, aa_buffers);
-#define COUNT_ARGS(X...) COUNT_ARGS_HELPER(, ##X, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
-#define COUNT_ARGS_HELPER(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, n, X...) n
-#define CONCAT(X, Y) X ## Y
-#define CONCAT_AFTER(X, Y) CONCAT(X, Y)
-
#define ASSIGN(FN, X, N) ((X) = FN(N))
#define EVAL1(FN, X) ASSIGN(FN, X, 0) /*X = FN(0)*/
#define EVAL2(FN, X, Y...) do { ASSIGN(FN, X, 1); EVAL1(FN, Y); } while (0)
-#define EVAL(FN, X...) CONCAT_AFTER(EVAL, COUNT_ARGS(X))(FN, X)
+#define EVAL(FN, X...) CONCATENATE(EVAL, COUNT_ARGS(X))(FN, X)
#define for_each_cpu_buffer(I) for ((I) = 0; (I) < MAX_PATH_BUFFERS; (I)++)
--
2.9.5
^ permalink raw reply related
* [PATCH v5 bpf-next 07/10] bpf: introduce BPF_RAW_TRACEPOINT
From: Alexei Starovoitov @ 2018-03-24 2:30 UTC (permalink / raw)
To: davem; +Cc: daniel, torvalds, peterz, rostedt, netdev, kernel-team, linux-api
In-Reply-To: <20180324023038.938665-1-ast@fb.com>
From: Alexei Starovoitov <ast@kernel.org>
Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
kernel internal arguments of the tracepoints in their raw form.
>From bpf program point of view the access to the arguments look like:
struct bpf_raw_tracepoint_args {
__u64 args[0];
};
int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
// program can read args[N] where N depends on tracepoint
// and statically verified at program load+attach time
}
kprobe+bpf infrastructure allows programs access function arguments.
This feature allows programs access raw tracepoint arguments.
Similar to proposed 'dynamic ftrace events' there are no abi guarantees
to what the tracepoints arguments are and what their meaning is.
The program needs to type cast args properly and use bpf_probe_read()
helper to access struct fields when argument is a pointer.
For every tracepoint __bpf_trace_##call function is prepared.
In assembler it looks like:
(gdb) disassemble __bpf_trace_xdp_exception
Dump of assembler code for function __bpf_trace_xdp_exception:
0xffffffff81132080 <+0>: mov %ecx,%ecx
0xffffffff81132082 <+2>: jmpq 0xffffffff811231f0 <bpf_trace_run3>
where
TRACE_EVENT(xdp_exception,
TP_PROTO(const struct net_device *dev,
const struct bpf_prog *xdp, u32 act),
The above assembler snippet is casting 32-bit 'act' field into 'u64'
to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
and in total this approach adds 7k bytes to .text and 8k bytes
to .rodata since the probe funcs need to appear in kallsyms.
The alternative of having __bpf_trace_##call being global in kallsyms
could have been to keep them static and add another pointer to these
static functions to 'struct trace_event_class' and 'struct trace_event_call',
but keeping them global simplifies implementation and keeps it indepedent
from the tracing side.
Also such approach gives the lowest possible overhead
while calling trace_xdp_exception() from kernel C code and
transitioning into bpf land.
Since tracepoint+bpf are used at speeds of 1M+ events per second
this is very valuable optimization.
Since ftrace and perf side are not involved the new
BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
that returns anon_inode FD of 'bpf-raw-tracepoint' object.
The user space looks like:
// load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
prog_fd = bpf_prog_load(...);
// receive anon_inode fd for given bpf_raw_tracepoint with prog attached
raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);
Ctrl-C of tracing daemon or cmdline tool that uses this feature
will automatically detach bpf program, unload it and
unregister tracepoint probe.
On the kernel side for_each_kernel_tracepoint() is used
to find a tracepoint with "xdp_exception" name
(that would be __tracepoint_xdp_exception record)
Then kallsyms_lookup_name() is used to find the addr
of __bpf_trace_xdp_exception() probe function.
And finally tracepoint_probe_register() is used to connect probe
with tracepoint.
Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
tracepoint mechanisms. perf_event_open() can be used in parallel
on the same tracepoint.
Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
Each with its own bpf program. The kernel will execute
all tracepoint probes and all attached bpf programs.
In the future bpf_raw_tracepoints can be extended with
query/introspection logic.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
include/linux/bpf_types.h | 1 +
include/linux/trace_events.h | 37 +++++++++
include/trace/bpf_probe.h | 87 ++++++++++++++++++++
include/trace/define_trace.h | 1 +
include/uapi/linux/bpf.h | 11 +++
kernel/bpf/syscall.c | 87 ++++++++++++++++++++
kernel/trace/bpf_trace.c | 188 +++++++++++++++++++++++++++++++++++++++++++
7 files changed, 412 insertions(+)
create mode 100644 include/trace/bpf_probe.h
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 5e2e8a49fb21..6d7243bfb0ff 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -19,6 +19,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_SK_MSG, sk_msg)
BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint)
BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
+BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
#endif
#ifdef CONFIG_CGROUP_BPF
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8a1442c4e513..e37fcd7505da 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -468,6 +468,8 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog);
void perf_event_detach_bpf_prog(struct perf_event *event);
int perf_event_query_prog_array(struct perf_event *event, void __user *info);
+int bpf_probe_register(struct tracepoint *tp, struct bpf_prog *prog);
+int bpf_probe_unregister(struct tracepoint *tp, struct bpf_prog *prog);
#else
static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
{
@@ -487,6 +489,14 @@ perf_event_query_prog_array(struct perf_event *event, void __user *info)
{
return -EOPNOTSUPP;
}
+static inline int bpf_probe_register(struct tracepoint *tp, struct bpf_prog *p)
+{
+ return -EOPNOTSUPP;
+}
+static inline int bpf_probe_unregister(struct tracepoint *tp, struct bpf_prog *p)
+{
+ return -EOPNOTSUPP;
+}
#endif
enum {
@@ -546,6 +556,33 @@ extern void ftrace_profile_free_filter(struct perf_event *event);
void perf_trace_buf_update(void *record, u16 type);
void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
+void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
+void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
+void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
+ u64 arg3);
+void bpf_trace_run4(struct bpf_prog *prog, u64 arg1, u64 arg2,
+ u64 arg3, u64 arg4);
+void bpf_trace_run5(struct bpf_prog *prog, u64 arg1, u64 arg2,
+ u64 arg3, u64 arg4, u64 arg5);
+void bpf_trace_run6(struct bpf_prog *prog, u64 arg1, u64 arg2,
+ u64 arg3, u64 arg4, u64 arg5, u64 arg6);
+void bpf_trace_run7(struct bpf_prog *prog, u64 arg1, u64 arg2,
+ u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7);
+void bpf_trace_run8(struct bpf_prog *prog, u64 arg1, u64 arg2,
+ u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+ u64 arg8);
+void bpf_trace_run9(struct bpf_prog *prog, u64 arg1, u64 arg2,
+ u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+ u64 arg8, u64 arg9);
+void bpf_trace_run10(struct bpf_prog *prog, u64 arg1, u64 arg2,
+ u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+ u64 arg8, u64 arg9, u64 arg10);
+void bpf_trace_run11(struct bpf_prog *prog, u64 arg1, u64 arg2,
+ u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+ u64 arg8, u64 arg9, u64 arg10, u64 arg11);
+void bpf_trace_run12(struct bpf_prog *prog, u64 arg1, u64 arg2,
+ u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
+ u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12);
void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx,
struct trace_event_call *call, u64 count,
struct pt_regs *regs, struct hlist_head *head,
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
new file mode 100644
index 000000000000..d2cc0663e618
--- /dev/null
+++ b/include/trace/bpf_probe.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#undef TRACE_SYSTEM_VAR
+
+#ifdef CONFIG_BPF_EVENTS
+
+#undef __entry
+#define __entry entry
+
+#undef __get_dynamic_array
+#define __get_dynamic_array(field) \
+ ((void *)__entry + (__entry->__data_loc_##field & 0xffff))
+
+#undef __get_dynamic_array_len
+#define __get_dynamic_array_len(field) \
+ ((__entry->__data_loc_##field >> 16) & 0xffff)
+
+#undef __get_str
+#define __get_str(field) ((char *)__get_dynamic_array(field))
+
+#undef __get_bitmask
+#define __get_bitmask(field) (char *)__get_dynamic_array(field)
+
+#undef __perf_count
+#define __perf_count(c) (c)
+
+#undef __perf_task
+#define __perf_task(t) (t)
+
+/* cast any integer, pointer, or small struct to u64 */
+#define UINTTYPE(size) \
+ __typeof__(__builtin_choose_expr(size == 1, (u8)1, \
+ __builtin_choose_expr(size == 2, (u16)2, \
+ __builtin_choose_expr(size == 4, (u32)3, \
+ __builtin_choose_expr(size == 8, (u64)4, \
+ (void)5)))))
+#define __CAST_TO_U64(x) ({ \
+ typeof(x) __src = (x); \
+ UINTTYPE(sizeof(x)) __dst; \
+ memcpy(&__dst, &__src, sizeof(__dst)); \
+ (u64)__dst; })
+
+#define __CAST1(a,...) __CAST_TO_U64(a)
+#define __CAST2(a,...) __CAST_TO_U64(a), __CAST1(__VA_ARGS__)
+#define __CAST3(a,...) __CAST_TO_U64(a), __CAST2(__VA_ARGS__)
+#define __CAST4(a,...) __CAST_TO_U64(a), __CAST3(__VA_ARGS__)
+#define __CAST5(a,...) __CAST_TO_U64(a), __CAST4(__VA_ARGS__)
+#define __CAST6(a,...) __CAST_TO_U64(a), __CAST5(__VA_ARGS__)
+#define __CAST7(a,...) __CAST_TO_U64(a), __CAST6(__VA_ARGS__)
+#define __CAST8(a,...) __CAST_TO_U64(a), __CAST7(__VA_ARGS__)
+#define __CAST9(a,...) __CAST_TO_U64(a), __CAST8(__VA_ARGS__)
+#define __CAST10(a,...) __CAST_TO_U64(a), __CAST9(__VA_ARGS__)
+#define __CAST11(a,...) __CAST_TO_U64(a), __CAST10(__VA_ARGS__)
+#define __CAST12(a,...) __CAST_TO_U64(a), __CAST11(__VA_ARGS__)
+/* tracepoints with more than 12 arguments will hit build error */
+#define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
+
+#undef DECLARE_EVENT_CLASS
+#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
+/* no 'static' here. The bpf probe functions are global */ \
+notrace void \
+__bpf_trace_##call(void *__data, proto) \
+{ \
+ struct bpf_prog *prog = __data; \
+ \
+ CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
+}
+
+/*
+ * This part is compiled out, it is only here as a build time check
+ * to make sure that if the tracepoint handling changes, the
+ * bpf probe will fail to compile unless it too is updated.
+ */
+#undef DEFINE_EVENT
+#define DEFINE_EVENT(template, call, proto, args) \
+static inline void bpf_test_probe_##call(void) \
+{ \
+ check_trace_callback_type_##call(__bpf_trace_##template); \
+}
+
+
+#undef DEFINE_EVENT_PRINT
+#define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
+ DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
+
+#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
+#endif /* CONFIG_BPF_EVENTS */
diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index 96b22ace9ae7..5f8216bc261f 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -95,6 +95,7 @@
#ifdef TRACEPOINTS_ENABLED
#include <trace/trace_events.h>
#include <trace/perf.h>
+#include <trace/bpf_probe.h>
#endif
#undef TRACE_EVENT
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 18b7c510c511..1878201c2d77 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -94,6 +94,7 @@ enum bpf_cmd {
BPF_MAP_GET_FD_BY_ID,
BPF_OBJ_GET_INFO_BY_FD,
BPF_PROG_QUERY,
+ BPF_RAW_TRACEPOINT_OPEN,
};
enum bpf_map_type {
@@ -134,6 +135,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SK_SKB,
BPF_PROG_TYPE_CGROUP_DEVICE,
BPF_PROG_TYPE_SK_MSG,
+ BPF_PROG_TYPE_RAW_TRACEPOINT,
};
enum bpf_attach_type {
@@ -344,6 +346,11 @@ union bpf_attr {
__aligned_u64 prog_ids;
__u32 prog_cnt;
} query;
+
+ struct {
+ __u64 name;
+ __u32 prog_fd;
+ } raw_tracepoint;
} __attribute__((aligned(8)));
/* BPF helper function descriptions:
@@ -1152,4 +1159,8 @@ struct bpf_cgroup_dev_ctx {
__u32 minor;
};
+struct bpf_raw_tracepoint_args {
+ __u64 args[0];
+};
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3aeb4ea2a93a..96bc45a6e7d6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1311,6 +1311,90 @@ static int bpf_obj_get(const union bpf_attr *attr)
attr->file_flags);
}
+struct bpf_raw_tracepoint {
+ struct tracepoint *tp;
+ struct bpf_prog *prog;
+};
+
+static int bpf_raw_tracepoint_release(struct inode *inode, struct file *filp)
+{
+ struct bpf_raw_tracepoint *raw_tp = filp->private_data;
+
+ if (raw_tp->prog) {
+ bpf_probe_unregister(raw_tp->tp, raw_tp->prog);
+ bpf_prog_put(raw_tp->prog);
+ }
+ kfree(raw_tp);
+ return 0;
+}
+
+static const struct file_operations bpf_raw_tp_fops = {
+ .release = bpf_raw_tracepoint_release,
+ .read = bpf_dummy_read,
+ .write = bpf_dummy_write,
+};
+
+static void *__find_tp(struct tracepoint *tp, void *priv)
+{
+ char *name = priv;
+
+ if (!strcmp(tp->name, name))
+ return tp;
+ return NULL;
+}
+
+#define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
+
+static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
+{
+ struct bpf_raw_tracepoint *raw_tp;
+ struct tracepoint *tp;
+ struct bpf_prog *prog;
+ char tp_name[128];
+ int tp_fd, err;
+
+ if (strncpy_from_user(tp_name, u64_to_user_ptr(attr->raw_tracepoint.name),
+ sizeof(tp_name) - 1) < 0)
+ return -EFAULT;
+ tp_name[sizeof(tp_name) - 1] = 0;
+
+ tp = for_each_kernel_tracepoint(__find_tp, tp_name);
+ if (!tp)
+ return -ENOENT;
+
+ raw_tp = kmalloc(sizeof(*raw_tp), GFP_USER | __GFP_ZERO);
+ if (!raw_tp)
+ return -ENOMEM;
+ raw_tp->tp = tp;
+
+ prog = bpf_prog_get_type(attr->raw_tracepoint.prog_fd,
+ BPF_PROG_TYPE_RAW_TRACEPOINT);
+ if (IS_ERR(prog)) {
+ err = PTR_ERR(prog);
+ goto out_free_tp;
+ }
+
+ err = bpf_probe_register(raw_tp->tp, prog);
+ if (err)
+ goto out_put_prog;
+
+ raw_tp->prog = prog;
+ tp_fd = anon_inode_getfd("bpf-raw-tracepoint", &bpf_raw_tp_fops, raw_tp,
+ O_CLOEXEC);
+ if (tp_fd < 0) {
+ bpf_probe_unregister(raw_tp->tp, prog);
+ err = tp_fd;
+ goto out_put_prog;
+ }
+ return tp_fd;
+
+out_put_prog:
+ bpf_prog_put(prog);
+out_free_tp:
+ kfree(raw_tp);
+ return err;
+}
+
#ifdef CONFIG_CGROUP_BPF
#define BPF_PROG_ATTACH_LAST_FIELD attach_flags
@@ -1921,6 +2005,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
case BPF_OBJ_GET_INFO_BY_FD:
err = bpf_obj_get_info_by_fd(&attr, uattr);
break;
+ case BPF_RAW_TRACEPOINT_OPEN:
+ err = bpf_raw_tracepoint_open(&attr);
+ break;
default:
err = -EINVAL;
break;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index c634e093951f..00e86aa11360 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -723,6 +723,86 @@ const struct bpf_verifier_ops tracepoint_verifier_ops = {
const struct bpf_prog_ops tracepoint_prog_ops = {
};
+/*
+ * bpf_raw_tp_regs are separate from bpf_pt_regs used from skb/xdp
+ * to avoid potential recursive reuse issue when/if tracepoints are added
+ * inside bpf_*_event_output and/or bpf_get_stack_id
+ */
+static DEFINE_PER_CPU(struct pt_regs, bpf_raw_tp_regs);
+BPF_CALL_5(bpf_perf_event_output_raw_tp, struct bpf_raw_tracepoint_args *, args,
+ struct bpf_map *, map, u64, flags, void *, data, u64, size)
+{
+ struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
+
+ perf_fetch_caller_regs(regs);
+ return ____bpf_perf_event_output(regs, map, flags, data, size);
+}
+
+static const struct bpf_func_proto bpf_perf_event_output_proto_raw_tp = {
+ .func = bpf_perf_event_output_raw_tp,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_CONST_MAP_PTR,
+ .arg3_type = ARG_ANYTHING,
+ .arg4_type = ARG_PTR_TO_MEM,
+ .arg5_type = ARG_CONST_SIZE_OR_ZERO,
+};
+
+BPF_CALL_3(bpf_get_stackid_raw_tp, struct bpf_raw_tracepoint_args *, args,
+ struct bpf_map *, map, u64, flags)
+{
+ struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
+
+ perf_fetch_caller_regs(regs);
+ /* similar to bpf_perf_event_output_tp, but pt_regs fetched differently */
+ return bpf_get_stackid((unsigned long) regs, (unsigned long) map,
+ flags, 0, 0);
+}
+
+static const struct bpf_func_proto bpf_get_stackid_proto_raw_tp = {
+ .func = bpf_get_stackid_raw_tp,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_CONST_MAP_PTR,
+ .arg3_type = ARG_ANYTHING,
+};
+
+static const struct bpf_func_proto *raw_tp_prog_func_proto(enum bpf_func_id func_id)
+{
+ switch (func_id) {
+ case BPF_FUNC_perf_event_output:
+ return &bpf_perf_event_output_proto_raw_tp;
+ case BPF_FUNC_get_stackid:
+ return &bpf_get_stackid_proto_raw_tp;
+ default:
+ return tracing_func_proto(func_id);
+ }
+}
+
+static bool raw_tp_prog_is_valid_access(int off, int size,
+ enum bpf_access_type type,
+ struct bpf_insn_access_aux *info)
+{
+ /* largest tracepoint in the kernel has 12 args */
+ if (off < 0 || off >= sizeof(__u64) * 12)
+ return false;
+ if (type != BPF_READ)
+ return false;
+ if (off % size != 0)
+ return false;
+ return true;
+}
+
+const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
+ .get_func_proto = raw_tp_prog_func_proto,
+ .is_valid_access = raw_tp_prog_is_valid_access,
+};
+
+const struct bpf_prog_ops raw_tracepoint_prog_ops = {
+};
+
static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
struct bpf_insn_access_aux *info)
{
@@ -896,3 +976,111 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info)
return ret;
}
+
+static __always_inline
+void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
+{
+ rcu_read_lock();
+ preempt_disable();
+ (void) BPF_PROG_RUN(prog, args);
+ preempt_enable();
+ rcu_read_unlock();
+}
+
+#define UNPACK(...) __VA_ARGS__
+#define REPEAT_1(FN, DL, X, ...) FN(X)
+#define REPEAT_2(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_1(FN, DL, __VA_ARGS__)
+#define REPEAT_3(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_2(FN, DL, __VA_ARGS__)
+#define REPEAT_4(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_3(FN, DL, __VA_ARGS__)
+#define REPEAT_5(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_4(FN, DL, __VA_ARGS__)
+#define REPEAT_6(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_5(FN, DL, __VA_ARGS__)
+#define REPEAT_7(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_6(FN, DL, __VA_ARGS__)
+#define REPEAT_8(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_7(FN, DL, __VA_ARGS__)
+#define REPEAT_9(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_8(FN, DL, __VA_ARGS__)
+#define REPEAT_10(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_9(FN, DL, __VA_ARGS__)
+#define REPEAT_11(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_10(FN, DL, __VA_ARGS__)
+#define REPEAT_12(FN, DL, X, ...) FN(X) UNPACK DL REPEAT_11(FN, DL, __VA_ARGS__)
+#define REPEAT(X, FN, DL, ...) REPEAT_##X(FN, DL, __VA_ARGS__)
+
+#define SARG(X) u64 arg##X
+#define COPY(X) args[X] = arg##X
+
+#define __DL_COM (,)
+#define __DL_SEM (;)
+
+#define __SEQ_0_11 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
+
+#define BPF_TRACE_DEFN_x(x) \
+ void bpf_trace_run##x(struct bpf_prog *prog, \
+ REPEAT(x, SARG, __DL_COM, __SEQ_0_11)) \
+ { \
+ u64 args[x]; \
+ REPEAT(x, COPY, __DL_SEM, __SEQ_0_11); \
+ __bpf_trace_run(prog, args); \
+ } \
+ EXPORT_SYMBOL_GPL(bpf_trace_run##x)
+BPF_TRACE_DEFN_x(1);
+BPF_TRACE_DEFN_x(2);
+BPF_TRACE_DEFN_x(3);
+BPF_TRACE_DEFN_x(4);
+BPF_TRACE_DEFN_x(5);
+BPF_TRACE_DEFN_x(6);
+BPF_TRACE_DEFN_x(7);
+BPF_TRACE_DEFN_x(8);
+BPF_TRACE_DEFN_x(9);
+BPF_TRACE_DEFN_x(10);
+BPF_TRACE_DEFN_x(11);
+BPF_TRACE_DEFN_x(12);
+
+static int __bpf_probe_register(struct tracepoint *tp, struct bpf_prog *prog)
+{
+ unsigned long addr;
+ char buf[128];
+
+ /*
+ * check that program doesn't access arguments beyond what's
+ * available in this tracepoint
+ */
+ if (prog->aux->max_ctx_offset > tp->num_args * sizeof(u64))
+ return -EINVAL;
+
+ snprintf(buf, sizeof(buf), "__bpf_trace_%s", tp->name);
+ addr = kallsyms_lookup_name(buf);
+ if (!addr)
+ return -ENOENT;
+
+ return tracepoint_probe_register(tp, (void *)addr, prog);
+}
+
+int bpf_probe_register(struct tracepoint *tp, struct bpf_prog *prog)
+{
+ int err;
+
+ mutex_lock(&bpf_event_mutex);
+ err = __bpf_probe_register(tp, prog);
+ mutex_unlock(&bpf_event_mutex);
+ return err;
+}
+
+static int __bpf_probe_unregister(struct tracepoint *tp, struct bpf_prog *prog)
+{
+ unsigned long addr;
+ char buf[128];
+
+ snprintf(buf, sizeof(buf), "__bpf_trace_%s", tp->name);
+ addr = kallsyms_lookup_name(buf);
+ if (!addr)
+ return -ENOENT;
+
+ return tracepoint_probe_unregister(tp, (void *)addr, prog);
+}
+
+int bpf_probe_unregister(struct tracepoint *tp, struct bpf_prog *prog)
+{
+ int err;
+
+ mutex_lock(&bpf_event_mutex);
+ err = __bpf_probe_unregister(tp, prog);
+ mutex_unlock(&bpf_event_mutex);
+ return err;
+}
--
2.9.5
^ permalink raw reply related
* [PATCH v5 bpf-next 01/10] treewide: remove large struct-pass-by-value from tracepoint arguments
From: Alexei Starovoitov @ 2018-03-24 2:30 UTC (permalink / raw)
To: davem; +Cc: daniel, torvalds, peterz, rostedt, netdev, kernel-team, linux-api
In-Reply-To: <20180324023038.938665-1-ast@fb.com>
From: Alexei Starovoitov <ast@kernel.org>
- fix trace_hfi1_ctxt_info() to pass large struct by reference instead of by value
- convert 'type array[]' tracepoint arguments into 'type *array',
since compiler will warn that sizeof('type array[]') == sizeof('type *array')
and later should be used instead
The CAST_TO_U64 macro in the later patch will enforce that tracepoint
arguments can only be integers, pointers, or less than 8 byte structures.
Larger structures should be passed by reference.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
drivers/infiniband/hw/hfi1/file_ops.c | 2 +-
drivers/infiniband/hw/hfi1/trace_ctxts.h | 12 ++++++------
include/trace/events/f2fs.h | 2 +-
net/wireless/trace.h | 2 +-
sound/firewire/amdtp-stream-trace.h | 2 +-
5 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 41fafebe3b0d..da4aa1a95b11 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -1153,7 +1153,7 @@ static int get_ctxt_info(struct hfi1_filedata *fd, unsigned long arg, u32 len)
cinfo.sdma_ring_size = fd->cq->nentries;
cinfo.rcvegr_size = uctxt->egrbufs.rcvtid_size;
- trace_hfi1_ctxt_info(uctxt->dd, uctxt->ctxt, fd->subctxt, cinfo);
+ trace_hfi1_ctxt_info(uctxt->dd, uctxt->ctxt, fd->subctxt, &cinfo);
if (copy_to_user((void __user *)arg, &cinfo, len))
return -EFAULT;
diff --git a/drivers/infiniband/hw/hfi1/trace_ctxts.h b/drivers/infiniband/hw/hfi1/trace_ctxts.h
index 4eb4cc798035..e00c8a7d559c 100644
--- a/drivers/infiniband/hw/hfi1/trace_ctxts.h
+++ b/drivers/infiniband/hw/hfi1/trace_ctxts.h
@@ -106,7 +106,7 @@ TRACE_EVENT(hfi1_uctxtdata,
TRACE_EVENT(hfi1_ctxt_info,
TP_PROTO(struct hfi1_devdata *dd, unsigned int ctxt,
unsigned int subctxt,
- struct hfi1_ctxt_info cinfo),
+ struct hfi1_ctxt_info *cinfo),
TP_ARGS(dd, ctxt, subctxt, cinfo),
TP_STRUCT__entry(DD_DEV_ENTRY(dd)
__field(unsigned int, ctxt)
@@ -120,11 +120,11 @@ TRACE_EVENT(hfi1_ctxt_info,
TP_fast_assign(DD_DEV_ASSIGN(dd);
__entry->ctxt = ctxt;
__entry->subctxt = subctxt;
- __entry->egrtids = cinfo.egrtids;
- __entry->rcvhdrq_cnt = cinfo.rcvhdrq_cnt;
- __entry->rcvhdrq_size = cinfo.rcvhdrq_entsize;
- __entry->sdma_ring_size = cinfo.sdma_ring_size;
- __entry->rcvegr_size = cinfo.rcvegr_size;
+ __entry->egrtids = cinfo->egrtids;
+ __entry->rcvhdrq_cnt = cinfo->rcvhdrq_cnt;
+ __entry->rcvhdrq_size = cinfo->rcvhdrq_entsize;
+ __entry->sdma_ring_size = cinfo->sdma_ring_size;
+ __entry->rcvegr_size = cinfo->rcvegr_size;
),
TP_printk("[%s] ctxt %u:%u " CINFO_FMT,
__get_str(dev),
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 06c87f9f720c..795698925d20 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -491,7 +491,7 @@ DEFINE_EVENT(f2fs__truncate_node, f2fs_truncate_node,
TRACE_EVENT(f2fs_truncate_partial_nodes,
- TP_PROTO(struct inode *inode, nid_t nid[], int depth, int err),
+ TP_PROTO(struct inode *inode, nid_t *nid, int depth, int err),
TP_ARGS(inode, nid, depth, err),
diff --git a/net/wireless/trace.h b/net/wireless/trace.h
index 5152938b358d..018c81fa72fb 100644
--- a/net/wireless/trace.h
+++ b/net/wireless/trace.h
@@ -3137,7 +3137,7 @@ TRACE_EVENT(rdev_start_radar_detection,
TRACE_EVENT(rdev_set_mcast_rate,
TP_PROTO(struct wiphy *wiphy, struct net_device *netdev,
- int mcast_rate[NUM_NL80211_BANDS]),
+ int *mcast_rate),
TP_ARGS(wiphy, netdev, mcast_rate),
TP_STRUCT__entry(
WIPHY_ENTRY
diff --git a/sound/firewire/amdtp-stream-trace.h b/sound/firewire/amdtp-stream-trace.h
index ea0d486652c8..54cdd4ffa9ce 100644
--- a/sound/firewire/amdtp-stream-trace.h
+++ b/sound/firewire/amdtp-stream-trace.h
@@ -14,7 +14,7 @@
#include <linux/tracepoint.h>
TRACE_EVENT(in_packet,
- TP_PROTO(const struct amdtp_stream *s, u32 cycles, u32 cip_header[2], unsigned int payload_length, unsigned int index),
+ TP_PROTO(const struct amdtp_stream *s, u32 cycles, u32 *cip_header, unsigned int payload_length, unsigned int index),
TP_ARGS(s, cycles, cip_header, payload_length, index),
TP_STRUCT__entry(
__field(unsigned int, second)
--
2.9.5
^ permalink raw reply related
* [PATCH v5 bpf-next 00/10] bpf, tracing: introduce bpf raw tracepoints
From: Alexei Starovoitov @ 2018-03-24 2:30 UTC (permalink / raw)
To: davem; +Cc: daniel, torvalds, peterz, rostedt, netdev, kernel-team, linux-api
From: Alexei Starovoitov <ast@kernel.org>
v4->v5:
- adopted Daniel's fancy REPEAT macro in bpf_trace.c in patch 7
v3->v4:
- adopted Linus's CAST_TO_U64 macro to cast any integer, pointer, or small
struct to u64. That nicely reduced the size of patch 1
v2->v3:
- with Linus's suggestion introduced generic COUNT_ARGS and CONCATENATE macros
(or rather moved them from apparmor)
that cleaned up patches 6 and 7
- added patch 4 to refactor trace_iwlwifi_dev_ucode_error() from 17 args to 4
Now any tracepoint with >12 args will have build error
v1->v2:
- simplified api by combing bpf_raw_tp_open(name) + bpf_attach(prog_fd) into
bpf_raw_tp_open(name, prog_fd) as suggested by Daniel.
That simplifies bpf_detach as well which is now simple close() of fd.
- fixed memory leak in error path which was spotted by Daniel.
- fixed bpf_get_stackid(), bpf_perf_event_output() called from raw tracepoints
- added more tests
- fixed allyesconfig build caught by buildbot
v1:
This patch set is a different way to address the pressing need to access
task_struct pointers in sched tracepoints from bpf programs.
The first approach simply added these pointers to sched tracepoints:
https://lkml.org/lkml/2017/12/14/753
which Peter nacked.
Few options were discussed and eventually the discussion converged on
doing bpf specific tracepoint_probe_register() probe functions.
Details here:
https://lkml.org/lkml/2017/12/20/929
Patch 1 is kernel wide cleanup of pass-struct-by-value into
pass-struct-by-reference into tracepoints.
Patches 2 and 3 are minor cleanups to address allyesconfig build
Patch 4 refactor trace_iwlwifi_dev_ucode_error from 17 to 4 args
Patch 5 introduces COUNT_ARGS macro
Patch 6 minor prep work to expose number of arguments passed
into tracepoints.
Patch 7 introduces BPF_RAW_TRACEPOINT api.
the auto-cleanup and multiple concurrent users are must have
features of tracing api. For bpf raw tracepoints it looks like:
// load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
prog_fd = bpf_prog_load(...);
// receive anon_inode fd for given bpf_raw_tracepoint
// and attach bpf program to it
raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);
Ctrl-C of tracing daemon or cmdline tool will automatically
detach bpf program, unload it and unregister tracepoint probe.
More details in patch 7.
Patch 8 - trivial support in libbpf
Patches 9, 10 - user space tests
samples/bpf/test_overhead performance on 1 cpu:
tracepoint base kprobe+bpf tracepoint+bpf raw_tracepoint+bpf
task_rename 1.1M 769K 947K 1.0M
urandom_read 789K 697K 750K 755K
Alexei Starovoitov (10):
treewide: remove large struct-pass-by-value from tracepoint arguments
net/mediatek: disambiguate mt76 vs mt7601u trace events
net/mac802154: disambiguate mac80215 vs mac802154 trace events
net/wireless/iwlwifi: fix iwlwifi_dev_ucode_error tracepoint
macro: introduce COUNT_ARGS() macro
tracepoint: compute num_args at build time
bpf: introduce BPF_RAW_TRACEPOINT
libbpf: add bpf_raw_tracepoint_open helper
samples/bpf: raw tracepoint test
selftests/bpf: test for bpf_get_stackid() from raw tracepoints
drivers/infiniband/hw/hfi1/file_ops.c | 2 +-
drivers/infiniband/hw/hfi1/trace_ctxts.h | 12 +-
drivers/net/wireless/intel/iwlwifi/dvm/main.c | 7 +-
.../wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h | 39 ++---
drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c | 1 +
drivers/net/wireless/intel/iwlwifi/mvm/utils.c | 7 +-
drivers/net/wireless/mediatek/mt7601u/trace.h | 6 +-
include/linux/bpf_types.h | 1 +
include/linux/kernel.h | 7 +
include/linux/trace_events.h | 37 ++++
include/linux/tracepoint-defs.h | 1 +
include/linux/tracepoint.h | 28 ++-
include/trace/bpf_probe.h | 87 ++++++++++
include/trace/define_trace.h | 15 +-
include/trace/events/f2fs.h | 2 +-
include/uapi/linux/bpf.h | 11 ++
kernel/bpf/syscall.c | 87 ++++++++++
kernel/trace/bpf_trace.c | 188 +++++++++++++++++++++
kernel/tracepoint.c | 27 +--
net/mac802154/trace.h | 8 +-
net/wireless/trace.h | 2 +-
samples/bpf/Makefile | 1 +
samples/bpf/bpf_load.c | 14 ++
samples/bpf/test_overhead_raw_tp_kern.c | 17 ++
samples/bpf/test_overhead_user.c | 12 ++
security/apparmor/include/path.h | 7 +-
sound/firewire/amdtp-stream-trace.h | 2 +-
tools/include/uapi/linux/bpf.h | 11 ++
tools/lib/bpf/bpf.c | 11 ++
tools/lib/bpf/bpf.h | 1 +
tools/testing/selftests/bpf/test_progs.c | 91 +++++++---
31 files changed, 638 insertions(+), 104 deletions(-)
create mode 100644 include/trace/bpf_probe.h
create mode 100644 samples/bpf/test_overhead_raw_tp_kern.c
--
2.9.5
^ permalink raw reply
* [PATCH v5 bpf-next 06/10] tracepoint: compute num_args at build time
From: Alexei Starovoitov @ 2018-03-24 2:30 UTC (permalink / raw)
To: davem; +Cc: daniel, torvalds, peterz, rostedt, netdev, kernel-team, linux-api
In-Reply-To: <20180324023038.938665-1-ast@fb.com>
From: Alexei Starovoitov <ast@kernel.org>
add fancy macro to compute number of arguments passed into tracepoint
at compile time and store it as part of 'struct tracepoint'.
The number is necessary to check safety of bpf program access that
is coming in subsequent patch.
for_each_tracepoint_range() api has no users inside the kernel.
Make it more useful with ability to stop for_each() loop depending
via callback return value.
In such form it's used in subsequent patch.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
include/linux/tracepoint-defs.h | 1 +
include/linux/tracepoint.h | 28 +++++++++++++++++++---------
include/trace/define_trace.h | 14 +++++++-------
kernel/tracepoint.c | 27 ++++++++++++++++-----------
4 files changed, 43 insertions(+), 27 deletions(-)
diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index 64ed7064f1fa..39a283c61c51 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -33,6 +33,7 @@ struct tracepoint {
int (*regfunc)(void);
void (*unregfunc)(void);
struct tracepoint_func __rcu *funcs;
+ u32 num_args;
};
#endif
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c94f466d57ef..2194e7c31484 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -40,9 +40,19 @@ tracepoint_probe_register_prio(struct tracepoint *tp, void *probe, void *data,
int prio);
extern int
tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data);
-extern void
-for_each_kernel_tracepoint(void (*fct)(struct tracepoint *tp, void *priv),
- void *priv);
+
+#ifdef CONFIG_TRACEPOINTS
+void *
+for_each_kernel_tracepoint(void *(*fct)(struct tracepoint *tp, void *priv),
+ void *priv);
+#else
+static inline void *
+for_each_kernel_tracepoint(void *(*fct)(struct tracepoint *tp, void *priv),
+ void *priv)
+{
+ return NULL;
+}
+#endif
#ifdef CONFIG_MODULES
struct tp_module {
@@ -230,18 +240,18 @@ extern void syscall_unregfunc(void);
* structures, so we create an array of pointers that will be used for iteration
* on the tracepoints.
*/
-#define DEFINE_TRACE_FN(name, reg, unreg) \
+#define DEFINE_TRACE_FN(name, reg, unreg, num_args) \
static const char __tpstrtab_##name[] \
__attribute__((section("__tracepoints_strings"))) = #name; \
struct tracepoint __tracepoint_##name \
__attribute__((section("__tracepoints"))) = \
- { __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL };\
+ { __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL, num_args };\
static struct tracepoint * const __tracepoint_ptr_##name __used \
__attribute__((section("__tracepoints_ptrs"))) = \
&__tracepoint_##name;
-#define DEFINE_TRACE(name) \
- DEFINE_TRACE_FN(name, NULL, NULL);
+#define DEFINE_TRACE(name, num_args) \
+ DEFINE_TRACE_FN(name, NULL, NULL, num_args);
#define EXPORT_TRACEPOINT_SYMBOL_GPL(name) \
EXPORT_SYMBOL_GPL(__tracepoint_##name)
@@ -275,8 +285,8 @@ extern void syscall_unregfunc(void);
return false; \
}
-#define DEFINE_TRACE_FN(name, reg, unreg)
-#define DEFINE_TRACE(name)
+#define DEFINE_TRACE_FN(name, reg, unreg, num_args)
+#define DEFINE_TRACE(name, num_args)
#define EXPORT_TRACEPOINT_SYMBOL_GPL(name)
#define EXPORT_TRACEPOINT_SYMBOL(name)
diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index d9e3d4aa3f6e..96b22ace9ae7 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -25,7 +25,7 @@
#undef TRACE_EVENT
#define TRACE_EVENT(name, proto, args, tstruct, assign, print) \
- DEFINE_TRACE(name)
+ DEFINE_TRACE(name, COUNT_ARGS(args))
#undef TRACE_EVENT_CONDITION
#define TRACE_EVENT_CONDITION(name, proto, args, cond, tstruct, assign, print) \
@@ -39,24 +39,24 @@
#undef TRACE_EVENT_FN
#define TRACE_EVENT_FN(name, proto, args, tstruct, \
assign, print, reg, unreg) \
- DEFINE_TRACE_FN(name, reg, unreg)
+ DEFINE_TRACE_FN(name, reg, unreg, COUNT_ARGS(args))
#undef TRACE_EVENT_FN_COND
#define TRACE_EVENT_FN_COND(name, proto, args, cond, tstruct, \
assign, print, reg, unreg) \
- DEFINE_TRACE_FN(name, reg, unreg)
+ DEFINE_TRACE_FN(name, reg, unreg, COUNT_ARGS(args))
#undef DEFINE_EVENT
#define DEFINE_EVENT(template, name, proto, args) \
- DEFINE_TRACE(name)
+ DEFINE_TRACE(name, COUNT_ARGS(args))
#undef DEFINE_EVENT_FN
#define DEFINE_EVENT_FN(template, name, proto, args, reg, unreg) \
- DEFINE_TRACE_FN(name, reg, unreg)
+ DEFINE_TRACE_FN(name, reg, unreg, COUNT_ARGS(args))
#undef DEFINE_EVENT_PRINT
#define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
- DEFINE_TRACE(name)
+ DEFINE_TRACE(name, COUNT_ARGS(args))
#undef DEFINE_EVENT_CONDITION
#define DEFINE_EVENT_CONDITION(template, name, proto, args, cond) \
@@ -64,7 +64,7 @@
#undef DECLARE_TRACE
#define DECLARE_TRACE(name, proto, args) \
- DEFINE_TRACE(name)
+ DEFINE_TRACE(name, COUNT_ARGS(args))
#undef TRACE_INCLUDE
#undef __TRACE_INCLUDE
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 671b13457387..3f2dc5738c2b 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -502,17 +502,22 @@ static __init int init_tracepoints(void)
__initcall(init_tracepoints);
#endif /* CONFIG_MODULES */
-static void for_each_tracepoint_range(struct tracepoint * const *begin,
- struct tracepoint * const *end,
- void (*fct)(struct tracepoint *tp, void *priv),
- void *priv)
+static void *for_each_tracepoint_range(struct tracepoint * const *begin,
+ struct tracepoint * const *end,
+ void *(*fct)(struct tracepoint *tp, void *priv),
+ void *priv)
{
struct tracepoint * const *iter;
+ void *ret;
if (!begin)
- return;
- for (iter = begin; iter < end; iter++)
- fct(*iter, priv);
+ return NULL;
+ for (iter = begin; iter < end; iter++) {
+ ret = fct(*iter, priv);
+ if (ret)
+ return ret;
+ }
+ return NULL;
}
/**
@@ -520,11 +525,11 @@ static void for_each_tracepoint_range(struct tracepoint * const *begin,
* @fct: callback
* @priv: private data
*/
-void for_each_kernel_tracepoint(void (*fct)(struct tracepoint *tp, void *priv),
- void *priv)
+void *for_each_kernel_tracepoint(void *(*fct)(struct tracepoint *tp, void *priv),
+ void *priv)
{
- for_each_tracepoint_range(__start___tracepoints_ptrs,
- __stop___tracepoints_ptrs, fct, priv);
+ return for_each_tracepoint_range(__start___tracepoints_ptrs,
+ __stop___tracepoints_ptrs, fct, priv);
}
EXPORT_SYMBOL_GPL(for_each_kernel_tracepoint);
--
2.9.5
^ permalink raw reply related
* [PATCH v5 bpf-next 08/10] libbpf: add bpf_raw_tracepoint_open helper
From: Alexei Starovoitov @ 2018-03-24 2:30 UTC (permalink / raw)
To: davem; +Cc: daniel, torvalds, peterz, rostedt, netdev, kernel-team, linux-api
In-Reply-To: <20180324023038.938665-1-ast@fb.com>
From: Alexei Starovoitov <ast@kernel.org>
add bpf_raw_tracepoint_open(const char *name, int prog_fd) api to libbpf
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
tools/include/uapi/linux/bpf.h | 11 +++++++++++
tools/lib/bpf/bpf.c | 11 +++++++++++
tools/lib/bpf/bpf.h | 1 +
3 files changed, 23 insertions(+)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d245c41213ac..58060bec999d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -94,6 +94,7 @@ enum bpf_cmd {
BPF_MAP_GET_FD_BY_ID,
BPF_OBJ_GET_INFO_BY_FD,
BPF_PROG_QUERY,
+ BPF_RAW_TRACEPOINT_OPEN,
};
enum bpf_map_type {
@@ -134,6 +135,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SK_SKB,
BPF_PROG_TYPE_CGROUP_DEVICE,
BPF_PROG_TYPE_SK_MSG,
+ BPF_PROG_TYPE_RAW_TRACEPOINT,
};
enum bpf_attach_type {
@@ -344,6 +346,11 @@ union bpf_attr {
__aligned_u64 prog_ids;
__u32 prog_cnt;
} query;
+
+ struct {
+ __u64 name;
+ __u32 prog_fd;
+ } raw_tracepoint;
} __attribute__((aligned(8)));
/* BPF helper function descriptions:
@@ -1151,4 +1158,8 @@ struct bpf_cgroup_dev_ctx {
__u32 minor;
};
+struct bpf_raw_tracepoint_args {
+ __u64 args[0];
+};
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 592a58a2b681..e0500055f1a6 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -428,6 +428,17 @@ int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
return err;
}
+int bpf_raw_tracepoint_open(const char *name, int prog_fd)
+{
+ union bpf_attr attr;
+
+ bzero(&attr, sizeof(attr));
+ attr.raw_tracepoint.name = ptr_to_u64(name);
+ attr.raw_tracepoint.prog_fd = prog_fd;
+
+ return sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr));
+}
+
int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
{
struct sockaddr_nl sa;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 8d18fb73d7fb..ee59342c6f42 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -79,4 +79,5 @@ int bpf_map_get_fd_by_id(__u32 id);
int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len);
int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
__u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt);
+int bpf_raw_tracepoint_open(const char *name, int prog_fd);
#endif
--
2.9.5
^ permalink raw reply related
* [PATCH v5 bpf-next 04/10] net/wireless/iwlwifi: fix iwlwifi_dev_ucode_error tracepoint
From: Alexei Starovoitov @ 2018-03-24 2:30 UTC (permalink / raw)
To: davem; +Cc: daniel, torvalds, peterz, rostedt, netdev, kernel-team, linux-api
In-Reply-To: <20180324023038.938665-1-ast@fb.com>
From: Alexei Starovoitov <ast@kernel.org>
fix iwlwifi_dev_ucode_error tracepoint to pass pointer to a table
instead of all 17 arguments by value.
dvm/main.c and mvm/utils.c have 'struct iwl_error_event_table'
defined with very similar yet subtly different fields and offsets.
tracepoint is still common and using definition of 'struct iwl_error_event_table'
from dvm/commands.h while copying fields.
Long term this tracepoint probably should be split into two.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
drivers/net/wireless/intel/iwlwifi/dvm/main.c | 7 +---
.../wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h | 39 ++++++++++------------
drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c | 1 +
drivers/net/wireless/intel/iwlwifi/mvm/utils.c | 7 +---
4 files changed, 21 insertions(+), 33 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/dvm/main.c b/drivers/net/wireless/intel/iwlwifi/dvm/main.c
index d11d72615de2..e68254e12764 100644
--- a/drivers/net/wireless/intel/iwlwifi/dvm/main.c
+++ b/drivers/net/wireless/intel/iwlwifi/dvm/main.c
@@ -1651,12 +1651,7 @@ static void iwl_dump_nic_error_log(struct iwl_priv *priv)
priv->status, table.valid);
}
- trace_iwlwifi_dev_ucode_error(trans->dev, table.error_id, table.tsf_low,
- table.data1, table.data2, table.line,
- table.blink2, table.ilink1, table.ilink2,
- table.bcon_time, table.gp1, table.gp2,
- table.gp3, table.ucode_ver, table.hw_ver,
- 0, table.brd_ver);
+ trace_iwlwifi_dev_ucode_error(trans->dev, &table, 0, table.brd_ver);
IWL_ERR(priv, "0x%08X | %-28s\n", table.error_id,
desc_lookup(table.error_id));
IWL_ERR(priv, "0x%08X | uPc\n", table.pc);
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h
index 9518a82f44c2..27e3e4e96aa2 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace-iwlwifi.h
@@ -126,14 +126,11 @@ TRACE_EVENT(iwlwifi_dev_tx,
__entry->framelen, __entry->skbaddr)
);
+struct iwl_error_event_table;
TRACE_EVENT(iwlwifi_dev_ucode_error,
- TP_PROTO(const struct device *dev, u32 desc, u32 tsf_low,
- u32 data1, u32 data2, u32 line, u32 blink2, u32 ilink1,
- u32 ilink2, u32 bcon_time, u32 gp1, u32 gp2, u32 rev_type,
- u32 major, u32 minor, u32 hw_ver, u32 brd_ver),
- TP_ARGS(dev, desc, tsf_low, data1, data2, line,
- blink2, ilink1, ilink2, bcon_time, gp1, gp2,
- rev_type, major, minor, hw_ver, brd_ver),
+ TP_PROTO(const struct device *dev, const struct iwl_error_event_table *table,
+ u32 hw_ver, u32 brd_ver),
+ TP_ARGS(dev, table, hw_ver, brd_ver),
TP_STRUCT__entry(
DEV_ENTRY
__field(u32, desc)
@@ -155,20 +152,20 @@ TRACE_EVENT(iwlwifi_dev_ucode_error,
),
TP_fast_assign(
DEV_ASSIGN;
- __entry->desc = desc;
- __entry->tsf_low = tsf_low;
- __entry->data1 = data1;
- __entry->data2 = data2;
- __entry->line = line;
- __entry->blink2 = blink2;
- __entry->ilink1 = ilink1;
- __entry->ilink2 = ilink2;
- __entry->bcon_time = bcon_time;
- __entry->gp1 = gp1;
- __entry->gp2 = gp2;
- __entry->rev_type = rev_type;
- __entry->major = major;
- __entry->minor = minor;
+ __entry->desc = table->error_id;
+ __entry->tsf_low = table->tsf_low;
+ __entry->data1 = table->data1;
+ __entry->data2 = table->data2;
+ __entry->line = table->line;
+ __entry->blink2 = table->blink2;
+ __entry->ilink1 = table->ilink1;
+ __entry->ilink2 = table->ilink2;
+ __entry->bcon_time = table->bcon_time;
+ __entry->gp1 = table->gp1;
+ __entry->gp2 = table->gp2;
+ __entry->rev_type = table->gp3;
+ __entry->major = table->ucode_ver;
+ __entry->minor = table->hw_ver;
__entry->hw_ver = hw_ver;
__entry->brd_ver = brd_ver;
),
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c
index 50510fb6ab8c..6aa719865a58 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c
@@ -30,6 +30,7 @@
#ifndef __CHECKER__
#include "iwl-trans.h"
+#include "dvm/commands.h"
#define CREATE_TRACE_POINTS
#include "iwl-devtrace.h"
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
index d65e1db7c097..5442ead876eb 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
@@ -549,12 +549,7 @@ static void iwl_mvm_dump_lmac_error_log(struct iwl_mvm *mvm, u32 base)
IWL_ERR(mvm, "Loaded firmware version: %s\n", mvm->fw->fw_version);
- trace_iwlwifi_dev_ucode_error(trans->dev, table.error_id, table.tsf_low,
- table.data1, table.data2, table.data3,
- table.blink2, table.ilink1,
- table.ilink2, table.bcon_time, table.gp1,
- table.gp2, table.fw_rev_type, table.major,
- table.minor, table.hw_ver, table.brd_ver);
+ trace_iwlwifi_dev_ucode_error(trans->dev, &table, table.hw_ver, table.brd_ver);
IWL_ERR(mvm, "0x%08X | %-28s\n", table.error_id,
desc_lookup(table.error_id));
IWL_ERR(mvm, "0x%08X | trm_hw_status0\n", table.trm_hw_status0);
--
2.9.5
^ permalink raw reply related
* [PATCH v5 bpf-next 03/10] net/mac802154: disambiguate mac80215 vs mac802154 trace events
From: Alexei Starovoitov @ 2018-03-24 2:30 UTC (permalink / raw)
To: davem; +Cc: daniel, torvalds, peterz, rostedt, netdev, kernel-team, linux-api
In-Reply-To: <20180324023038.938665-1-ast@fb.com>
From: Alexei Starovoitov <ast@kernel.org>
two trace events defined with the same name and both unused.
They conflict in allyesconfig build. Rename one of them.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
net/mac802154/trace.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/mac802154/trace.h b/net/mac802154/trace.h
index 2c8a43d3607f..df855c33daf2 100644
--- a/net/mac802154/trace.h
+++ b/net/mac802154/trace.h
@@ -33,7 +33,7 @@
/* Tracing for driver callbacks */
-DECLARE_EVENT_CLASS(local_only_evt,
+DECLARE_EVENT_CLASS(local_only_evt4,
TP_PROTO(struct ieee802154_local *local),
TP_ARGS(local),
TP_STRUCT__entry(
@@ -45,7 +45,7 @@ DECLARE_EVENT_CLASS(local_only_evt,
TP_printk(LOCAL_PR_FMT, LOCAL_PR_ARG)
);
-DEFINE_EVENT(local_only_evt, 802154_drv_return_void,
+DEFINE_EVENT(local_only_evt4, 802154_drv_return_void,
TP_PROTO(struct ieee802154_local *local),
TP_ARGS(local)
);
@@ -65,12 +65,12 @@ TRACE_EVENT(802154_drv_return_int,
__entry->ret)
);
-DEFINE_EVENT(local_only_evt, 802154_drv_start,
+DEFINE_EVENT(local_only_evt4, 802154_drv_start,
TP_PROTO(struct ieee802154_local *local),
TP_ARGS(local)
);
-DEFINE_EVENT(local_only_evt, 802154_drv_stop,
+DEFINE_EVENT(local_only_evt4, 802154_drv_stop,
TP_PROTO(struct ieee802154_local *local),
TP_ARGS(local)
);
--
2.9.5
^ permalink raw reply related
* [PATCH v5 bpf-next 10/10] selftests/bpf: test for bpf_get_stackid() from raw tracepoints
From: Alexei Starovoitov @ 2018-03-24 2:30 UTC (permalink / raw)
To: davem; +Cc: daniel, torvalds, peterz, rostedt, netdev, kernel-team, linux-api
In-Reply-To: <20180324023038.938665-1-ast@fb.com>
From: Alexei Starovoitov <ast@kernel.org>
similar to traditional traceopint test add bpf_get_stackid() test
from raw tracepoints
and reduce verbosity of existing stackmap test
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
tools/testing/selftests/bpf/test_progs.c | 91 ++++++++++++++++++++++++--------
1 file changed, 70 insertions(+), 21 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index e9df48b306df..faadbe233966 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -877,7 +877,7 @@ static void test_stacktrace_map()
err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
if (CHECK(err, "prog_load", "err %d errno %d\n", err, errno))
- goto out;
+ return;
/* Get the ID for the sched/sched_switch tracepoint */
snprintf(buf, sizeof(buf),
@@ -888,8 +888,7 @@ static void test_stacktrace_map()
bytes = read(efd, buf, sizeof(buf));
close(efd);
- if (CHECK(bytes <= 0 || bytes >= sizeof(buf),
- "read", "bytes %d errno %d\n", bytes, errno))
+ if (bytes <= 0 || bytes >= sizeof(buf))
goto close_prog;
/* Open the perf event and attach bpf progrram */
@@ -906,29 +905,24 @@ static void test_stacktrace_map()
goto close_prog;
err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
- if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n",
- err, errno))
- goto close_pmu;
+ if (err)
+ goto disable_pmu;
err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
- if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n",
- err, errno))
+ if (err)
goto disable_pmu;
/* find map fds */
control_map_fd = bpf_find_map(__func__, obj, "control_map");
- if (CHECK(control_map_fd < 0, "bpf_find_map control_map",
- "err %d errno %d\n", err, errno))
+ if (control_map_fd < 0)
goto disable_pmu;
stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap");
- if (CHECK(stackid_hmap_fd < 0, "bpf_find_map stackid_hmap",
- "err %d errno %d\n", err, errno))
+ if (stackid_hmap_fd < 0)
goto disable_pmu;
stackmap_fd = bpf_find_map(__func__, obj, "stackmap");
- if (CHECK(stackmap_fd < 0, "bpf_find_map stackmap", "err %d errno %d\n",
- err, errno))
+ if (stackmap_fd < 0)
goto disable_pmu;
/* give some time for bpf program run */
@@ -945,24 +939,78 @@ static void test_stacktrace_map()
err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
"err %d errno %d\n", err, errno))
- goto disable_pmu;
+ goto disable_pmu_noerr;
err = compare_map_keys(stackmap_fd, stackid_hmap_fd);
if (CHECK(err, "compare_map_keys stackmap vs. stackid_hmap",
"err %d errno %d\n", err, errno))
- ; /* fall through */
+ goto disable_pmu_noerr;
+ goto disable_pmu_noerr;
disable_pmu:
+ error_cnt++;
+disable_pmu_noerr:
ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE);
-
-close_pmu:
close(pmu_fd);
-
close_prog:
bpf_object__close(obj);
+}
-out:
- return;
+static void test_stacktrace_map_raw_tp()
+{
+ int control_map_fd, stackid_hmap_fd, stackmap_fd;
+ const char *file = "./test_stacktrace_map.o";
+ int efd, err, prog_fd;
+ __u32 key, val, duration = 0;
+ struct bpf_object *obj;
+
+ err = bpf_prog_load(file, BPF_PROG_TYPE_RAW_TRACEPOINT, &obj, &prog_fd);
+ if (CHECK(err, "prog_load raw tp", "err %d errno %d\n", err, errno))
+ return;
+
+ efd = bpf_raw_tracepoint_open("sched_switch", prog_fd);
+ if (CHECK(efd < 0, "raw_tp_open", "err %d errno %d\n", efd, errno))
+ goto close_prog;
+
+ /* find map fds */
+ control_map_fd = bpf_find_map(__func__, obj, "control_map");
+ if (control_map_fd < 0)
+ goto close_prog;
+
+ stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap");
+ if (stackid_hmap_fd < 0)
+ goto close_prog;
+
+ stackmap_fd = bpf_find_map(__func__, obj, "stackmap");
+ if (stackmap_fd < 0)
+ goto close_prog;
+
+ /* give some time for bpf program run */
+ sleep(1);
+
+ /* disable stack trace collection */
+ key = 0;
+ val = 1;
+ bpf_map_update_elem(control_map_fd, &key, &val, 0);
+
+ /* for every element in stackid_hmap, we can find a corresponding one
+ * in stackmap, and vise versa.
+ */
+ err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
+ if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
+ "err %d errno %d\n", err, errno))
+ goto close_prog;
+
+ err = compare_map_keys(stackmap_fd, stackid_hmap_fd);
+ if (CHECK(err, "compare_map_keys stackmap vs. stackid_hmap",
+ "err %d errno %d\n", err, errno))
+ goto close_prog;
+
+ goto close_prog_noerr;
+close_prog:
+ error_cnt++;
+close_prog_noerr:
+ bpf_object__close(obj);
}
static int extract_build_id(char *build_id, size_t size)
@@ -1138,6 +1186,7 @@ int main(void)
test_tp_attach_query();
test_stacktrace_map();
test_stacktrace_build_id();
+ test_stacktrace_map_raw_tp();
printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
--
2.9.5
^ permalink raw reply related
* Re: [PATCH v7 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
From: okaya @ 2018-03-24 2:34 UTC (permalink / raw)
To: Jeff Kirsher
Cc: sulrich, Netdev, Timur Tabi, Alexander Duyck, intel-wired-lan,
linux-arm-msm, linux-arm-kernel
In-Reply-To: <1521849496.15055.16.camel@intel.com>
On 2018-03-23 19:58, Jeff Kirsher wrote:
> On Fri, 2018-03-23 at 14:53 -0700, Alexander Duyck wrote:
>> On Fri, Mar 23, 2018 at 11:52 AM, Sinan Kaya <okaya@codeaurora.org>
>> wrote:
>> > Code includes wmb() followed by writel() in multiple places. writel()
>> > already has a barrier on some architectures like arm64.
>> >
>> > This ends up CPU observing two barriers back to back before executing
>> > the
>> > register write.
>> >
>> > Since code already has an explicit barrier call, changing writel() to
>> > writel_relaxed().
>> >
>> > I did a regex search for wmb() followed by writel() in each drivers
>> > directory.
>> > I scrubbed the ones I care about in this series.
>> >
>> > I considered "ease of change", "popular usage" and "performance
>> > critical
>> > path" as the determining criteria for my filtering.
>> >
>> > We used relaxed API heavily on ARM for a long time but
>> > it did not exist on other architectures. For this reason, relaxed
>> > architectures have been paying double penalty in order to use the
>> > common
>> > drivers.
>> >
>> > Now that relaxed API is present on all architectures, we can go and
>> > scrub
>> > all drivers to see what needs to change and what can remain.
>> >
>> > We start with mostly used ones and hope to increase the coverage over
>> > time.
>> > It will take a while to cover all drivers.
>> >
>> > Feel free to apply patches individually.
>>
>> I looked over the set and they seem good.
>>
>> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
>
> Grrr, patch 1 does not apply cleanly to my next-queue tree (dev-queue
> branch). I will deal with this series in a day or two, after I have
> dealt
> with my driver pull requests.
Sorry, you will have to replace the ones you took from me.
>
>> >
>> > Changes since v6:
>> > clean up between 2..6 and then make your Alex's changes on 1 and 7
>> > The mmiowb shouldn't be needed for Rx. Only one CPU will be running
>> > NAPI for the queue and we will synchronize this with a full writel
>> > anyway when we re-enable the interrupts.
>> >
>> > Sinan Kaya (7):
>> > i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
>> > ixgbe: eliminate duplicate barriers on weakly-ordered archs
>> > igbvf: eliminate duplicate barriers on weakly-ordered archs
>> > igb: eliminate duplicate barriers on weakly-ordered archs
>> > fm10k: Eliminate duplicate barriers on weakly-ordered archs
>> > ixgbevf: keep writel() closer to wmb()
>> > ixgbevf: eliminate duplicate barriers on weakly-ordered archs
>> >
>> > drivers/net/ethernet/intel/fm10k/fm10k_main.c | 4 ++--
>> > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 14 ++++++++++----
>> > drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 4 ++--
>> > drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
>> > drivers/net/ethernet/intel/igbvf/netdev.c | 4 ++--
>> > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 8 ++++----
>> > drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 5 -----
>> > drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 ++++++++---
>> > 8 files changed, 30 insertions(+), 24 deletions(-)
>> >
>> > --
>> > 2.7.4
>> >
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox