Linux Trace Kernel
 help / color / mirror / Atom feed
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: David Hildenbrand (Arm) @ 2026-06-15 15:18 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), Gregory Price
  Cc: Balbir Singh, lsf-pc, linux-kernel, linux-cxl, cgroups, linux-mm,
	linux-trace-kernel, damon, kernel-team, gregkh, rafael, dakr,
	dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, longman, akpm,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	osalvador, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, ying.huang, apopple, axelrasmussen, yuanchu, weixugc,
	yury.norov, linux, mhiramat, mathieu.desnoyers, tj, hannes,
	mkoutny, jackmanb, sj, baolin.wang, npache, ryan.roberts,
	dev.jain, baohua, lance.yang, muchun.song, xu.xin16,
	chengming.zhou, jannh, linmiaohe, nao.horiguchi, pfalcato,
	rientjes, shakeel.butt, riel, harry.yoo, cl, roman.gushchin,
	chrisl, kasong, shikemeng, nphamcs, bhe, zhengqi.arch,
	terry.bowman, Matthew Wilcox
In-Reply-To: <9f1815b0-896b-44ab-9e6d-9316d8f11033@kernel.org>

On 6/15/26 16:38, Vlastimil Babka (SUSE) wrote:
> On 6/12/26 17:29, Gregory Price wrote:
>> On Wed, Jun 10, 2026 at 04:12:52PM -0400, Gregory Price wrote:
>>> ... snip ...
>>>
>>> I will still probably send the next RFC version tomorrow or friday,
>>> as I want to get some eyes on the __GFP_PRIVATE-less pattern.
>>>
>>> Also, I made a new `anondax` driver which enables userland testing
>>> of this functionality without any specialty hardware.
>>>
>>
>> (apologies for the length of this email: this will all be covered in
>> the coming cover letter, but I just wanted to share a bit of a preview)
>>
>> ===
>>
>> Just another small update - I am planning to post the RFC today once i
>> get some mild cleanup done.  It will be based on the dax atomic hotplug
>>
>> https://lore.kernel.org/linux-mm/20260605211911.2160954-1-gourry@gourry.net/
>>
>> But a couple specific details regarding the memalloc pieces that i've
>> learned the past couple of days playing with it.
>>
>> 1) memalloc_folio is required to ensure non-folio allocations don't land
>>    on the private node, even if it happens within a memalloc_private
>>    context.  Since memalloc_folio may be useful in contexts outside of
>>    private nodes, I kept this as a separate flag.
>>
>>    If we think there will *never* be additional users of memalloc_folio,
>>    then we could fold _folio into _private to save the flag for now and
>>    add it back when we actually need it.
>>
>> 2) memalloc_private is needed to unlock private nodes, but in the
>>    original NOFALLBACK-only design, you also needed __GFP_THISNODE.
>>
>>    This is *highly* restrictive.  I found when playing with mbind that
>>    MPOL_BIND + __GFP_THISNODE generates a WARN (valid WARN, it normally
>>    implies a bug). 
>>
>>    That leads me to #3
> 
> I think the memalloc approach is dangerous due to unexpected nesting. There
> might be nested page allocations in page allocation itself (due to some
> debugging option). But also interrupts do not change what "current" points
> to. Suddenly those could start requesting folios and/or private nodes and be
> surprised, I'm afraid.

Yeah, we'd need some way to distinguish the main allocation from these other
(nested) allocations.


> 
> The memalloc scopes only work well when they restrict the context wrt
> reclaim, and allocations in IRQ have to be already restricted heavily
> (atomic) so further memalloc restrictions don't do anything in practice. But
> to make them change other aspects of the allocations like this won't work.

I was assuming that memalloc_pin_save() would already violate that, but really
it only restricts where movable allocations land, and that doesn't matter for
other kernel allocations.

Do you see any other way to make something like an allocation context work, and
avoid introducing more GFP flags?

-- 
Cheers,

David

^ permalink raw reply

* [PATCH] tracing: eprobe: read the complete FILTER_PTR_STRING pointer
From: Martin Kaiser @ 2026-06-15 14:54 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu
  Cc: linux-trace-kernel, linux-kernel, Martin Kaiser

For a char * element in an event, the FILTER_PTR_STRING filter type is
used. When the event occurs, a pointer is stored in the ringbuffer.

If an eprobe references such a char * element of a "base event" and
decodes the pointer as string, the pointer cannot be dereferenced.

$ echo 'e syscalls.sys_enter_openat $filename:string' > \
		/sys/kernel/tracing/dynamic_events
$ trace-cmd start -e eprobes
$ trace-cmd show
    ... : sys_enter_openat: (syscalls.sys_enter_openat) arg1=(fault)

The problem is in get_event_field

	val = (unsigned long)(*(char *)addr);

addr points to the position in the ringbuffer where the pointer was
stored. We must read the complete pointer, not just the lowest byte.

Fix the assignment, make the example above work.

Signed-off-by: Martin Kaiser <martin@kaiser.cx>
---
 kernel/trace/trace_eprobe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_eprobe.c b/kernel/trace/trace_eprobe.c
index b66d6196338d..50518b071414 100644
--- a/kernel/trace/trace_eprobe.c
+++ b/kernel/trace/trace_eprobe.c
@@ -315,7 +315,7 @@ get_event_field(struct fetch_insn *code, void *rec)
 			val = (unsigned long)addr;
 			break;
 		case FILTER_PTR_STRING:
-			val = (unsigned long)(*(char *)addr);
+			val = *(unsigned long *)addr;
 			break;
 		default:
 			WARN_ON_ONCE(1);
-- 
2.43.7


^ permalink raw reply related

* Re: [Lsf-pc] [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: Vlastimil Babka (SUSE) @ 2026-06-15 14:38 UTC (permalink / raw)
  To: Gregory Price, David Hildenbrand (Arm)
  Cc: Balbir Singh, lsf-pc, linux-kernel, linux-cxl, cgroups, linux-mm,
	linux-trace-kernel, damon, kernel-team, gregkh, rafael, dakr,
	dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, longman, akpm,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	osalvador, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, ying.huang, apopple, axelrasmussen, yuanchu, weixugc,
	yury.norov, linux, mhiramat, mathieu.desnoyers, tj, hannes,
	mkoutny, jackmanb, sj, baolin.wang, npache, ryan.roberts,
	dev.jain, baohua, lance.yang, muchun.song, xu.xin16,
	chengming.zhou, jannh, linmiaohe, nao.horiguchi, pfalcato,
	rientjes, shakeel.butt, riel, harry.yoo, cl, roman.gushchin,
	chrisl, kasong, shikemeng, nphamcs, bhe, zhengqi.arch,
	terry.bowman, Matthew Wilcox
In-Reply-To: <aiwl4kCG814dpX7L@gourry-fedora-PF4VCD3F>

On 6/12/26 17:29, Gregory Price wrote:
> On Wed, Jun 10, 2026 at 04:12:52PM -0400, Gregory Price wrote:
>> On Wed, Jun 10, 2026 at 08:59:59PM +0200, David Hildenbrand (Arm) wrote:
>> > > 
>> > > I understand this question in two ways:
>> > > 
>> > >   1) Can we disallow PAGE allocation and limit this to FOLIO allocation
>> > 
>> > Yes. Can we only allow folios to be allocated from private memory nodes. So let
>> > me reply to that one below.
>> > 
>> ... snip ...
>> > 
>> > At LSF/MM we talked about how GFP flags are bad and how deriving stuff from the
>> > context might be better. I think there was also talk about how the memalloc_*
>> > interface might be a better way forward. Maybe we would start giving the
>> > allocator more context ("we are allocating a folio").
>> > 
>> > The following is incomplete (esp. hugetlb stuff I assume), just as some idea:
>> >
>> 
>> I will still probably send the next RFC version tomorrow or friday,
>> as I want to get some eyes on the __GFP_PRIVATE-less pattern.
>> 
>> Also, I made a new `anondax` driver which enables userland testing
>> of this functionality without any specialty hardware.
>> 
> 
> (apologies for the length of this email: this will all be covered in
> the coming cover letter, but I just wanted to share a bit of a preview)
> 
> ===
> 
> Just another small update - I am planning to post the RFC today once i
> get some mild cleanup done.  It will be based on the dax atomic hotplug
> 
> https://lore.kernel.org/linux-mm/20260605211911.2160954-1-gourry@gourry.net/
> 
> But a couple specific details regarding the memalloc pieces that i've
> learned the past couple of days playing with it.
> 
> 1) memalloc_folio is required to ensure non-folio allocations don't land
>    on the private node, even if it happens within a memalloc_private
>    context.  Since memalloc_folio may be useful in contexts outside of
>    private nodes, I kept this as a separate flag.
> 
>    If we think there will *never* be additional users of memalloc_folio,
>    then we could fold _folio into _private to save the flag for now and
>    add it back when we actually need it.
> 
> 2) memalloc_private is needed to unlock private nodes, but in the
>    original NOFALLBACK-only design, you also needed __GFP_THISNODE.
> 
>    This is *highly* restrictive.  I found when playing with mbind that
>    MPOL_BIND + __GFP_THISNODE generates a WARN (valid WARN, it normally
>    implies a bug). 
> 
>    That leads me to #3

I think the memalloc approach is dangerous due to unexpected nesting. There
might be nested page allocations in page allocation itself (due to some
debugging option). But also interrupts do not change what "current" points
to. Suddenly those could start requesting folios and/or private nodes and be
surprised, I'm afraid.

The memalloc scopes only work well when they restrict the context wrt
reclaim, and allocations in IRQ have to be already restricted heavily
(atomic) so further memalloc restrictions don't do anything in practice. But
to make them change other aspects of the allocations like this won't work.

> 3) If a private node is opted into something like Demotion (the node is
>    a demotion target) or mbind(), such that normal kernel operation can
>    place memory there - it's *pseudo-private*, and should actually land
>    in it's own FALLBACK list (reachable without __GFP_THISNODE, but not
>    reachable as a normal fallback allocation target).
> 
> I'm still playing with this, but I think we can even omit the
> __GFP_THISNODE requirement (my initial feeling that __GFP_THISNODE
> didn't buy us anything in particular seems to have panned out).
> 
> At the end of the day, this makes the whole memalloc_private_save()
> pattern a heck of a lot cleaner than trying fiddle with GFP.
> 
> I think you will all enjoy how clean the code ends up, and how easily
> testable it is.
> 
> As a testbed I've implement an anondax (we can discuss naming) that
> adds some sample NODE_PRIVATE_OPT_* flags so you can do the following.
> 
> I'm including this in the next RFC - but we can hack the entire thing
> off (including the OPT flags) if we prefer to just get the base set in
> without a new driver as a start.
> 
> echo 1 > dax0.0/reclaim   # kswapd and reclaim run normally on this node
> echo 1 > dax0.0/demotion  # it is a demotion target
> echo 1 > dax0.0/mbind     # mbind() can target this node for anon-vma's
> echo 1 > dax0.0/madvise   # allow madvise() to operate on its folios
> echo 1 > dax0.0/numa_balance  # allow numa balancing for this node
> echo 1 > dax0.0/ltpin     # allow GUP longterm pin to operate normally
> echo * > dax0.0/adistance # set the adistance for hotplug time
> echo * > dax0.0/hotplug   # same as kmem/hotplug
> 
> This also means *existing hardware* can leverage private nodes if
> they're capable of generating a dax device.
> 
> I've even gotten it such that you can put a private node above dram in
> the adistance heirarchy - which means demotion flows downward from
> device to CPU, but allocations don't default or fallback there.
> 
> This seems *immediately* useful for a variety of use cases.
> 
> ~Gregory
> _______________________________________________
> Lsf-pc mailing list
> Lsf-pc@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lsf-pc


^ permalink raw reply

* [PATCH v5 2/2] serial: qcom-geni: Add tracepoints for Qualcomm GENI serial driver
From: Praveen Talari @ 2026-06-15 14:16 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Greg Kroah-Hartman, Jiri Slaby, konrad.dybcio
  Cc: Praveen Talari, linux-kernel, linux-trace-kernel, linux-arm-msm,
	linux-serial, mukesh.savaliya, aniket.randive, chandana.chiluveru
In-Reply-To: <20260615-add-tracepoints-for-qcom-geni-serial-v5-0-2efa4c97e0e2@oss.qualcomm.com>

Add tracing to the Qualcomm GENI serial driver to improve runtime
observability.

Trace hooks are added at key points including termios and clock
configuration, manual control get/set, interrupt handling, and data
TX/RX paths.

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
---
v2->v3:
- Updated commit text(removed example as it was available on cover
  letter).
---
 drivers/tty/serial/qcom_geni_serial.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index d81b539cff7f..4b62e58d4918 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -7,6 +7,9 @@
 /* Disable MMIO tracing to prevent excessive logging of unwanted MMIO traces */
 #define __DISABLE_TRACE_MMIO__
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/qcom_geni_serial.h>
+
 #include <linux/clk.h>
 #include <linux/console.h>
 #include <linux/io.h>
@@ -226,7 +229,7 @@ static void qcom_geni_serial_config_port(struct uart_port *uport, int cfg_flags)
 static unsigned int qcom_geni_serial_get_mctrl(struct uart_port *uport)
 {
 	unsigned int mctrl = TIOCM_DSR | TIOCM_CAR;
-	u32 geni_ios;
+	u32 geni_ios = 0;
 
 	if (uart_console(uport)) {
 		mctrl |= TIOCM_CTS;
@@ -236,6 +239,8 @@ static unsigned int qcom_geni_serial_get_mctrl(struct uart_port *uport)
 			mctrl |= TIOCM_CTS;
 	}
 
+	trace_geni_serial_get_mctrl(uport->dev, mctrl, geni_ios);
+
 	return mctrl;
 }
 
@@ -254,6 +259,8 @@ static void qcom_geni_serial_set_mctrl(struct uart_port *uport,
 	if (port->manual_flow && !(mctrl & TIOCM_RTS) && !uport->suspended)
 		uart_manual_rfr = UART_MANUAL_RFR_EN | UART_RFR_NOT_READY;
 	writel(uart_manual_rfr, uport->membase + SE_UART_MANUAL_RFR);
+
+	trace_geni_serial_set_mctrl(uport->dev, mctrl, uart_manual_rfr);
 }
 
 static const char *qcom_geni_serial_get_type(struct uart_port *uport)
@@ -684,6 +691,8 @@ static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
 	xmit_size = kfifo_out_linear_ptr(&tport->xmit_fifo, &tail,
 			UART_XMIT_SIZE);
 
+	trace_geni_serial_tx_data(uport->dev, tail, xmit_size);
+
 	qcom_geni_set_rs485_mode(uport, SER_RS485_RTS_ON_SEND);
 
 	qcom_geni_serial_setup_tx(uport, xmit_size);
@@ -910,8 +919,10 @@ static void qcom_geni_serial_handle_rx_dma(struct uart_port *uport, bool drop)
 		return;
 	}
 
-	if (!drop)
+	if (!drop) {
+		trace_geni_serial_rx_data(uport->dev, port->rx_buf, rx_in);
 		handle_rx_uart(uport, rx_in);
+	}
 
 	ret = geni_se_rx_dma_prep(&port->se, port->rx_buf,
 				  DMA_RX_BUF_SIZE,
@@ -1082,6 +1093,10 @@ static irqreturn_t qcom_geni_serial_isr(int isr, void *dev)
 	geni_status = readl(uport->membase + SE_GENI_STATUS);
 	dma = readl(uport->membase + SE_GENI_DMA_MODE_EN);
 	m_irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
+
+	trace_geni_serial_irq(uport->dev, m_irq_status, s_irq_status,
+			      dma_tx_status, dma_rx_status);
+
 	writel(m_irq_status, uport->membase + SE_GENI_M_IRQ_CLEAR);
 	writel(s_irq_status, uport->membase + SE_GENI_S_IRQ_CLEAR);
 	writel(dma_tx_status, uport->membase + SE_DMA_TX_IRQ_CLR);
@@ -1294,8 +1309,8 @@ static int geni_serial_set_rate(struct uart_port *uport, unsigned int baud)
 		return -EINVAL;
 	}
 
-	dev_dbg(port->se.dev, "desired_rate = %u, clk_rate = %lu, clk_div = %u, clk_idx = %u\n",
-		baud * sampling_rate, clk_rate, clk_div, clk_idx);
+	trace_geni_serial_clk_cfg(uport->dev, baud * sampling_rate, clk_rate,
+				  clk_div, clk_idx);
 
 	uport->uartclk = clk_rate;
 	port->clk_rate = clk_rate;
@@ -1455,6 +1470,10 @@ static void qcom_geni_serial_set_termios(struct uart_port *uport,
 	writel(bits_per_char, uport->membase + SE_UART_TX_WORD_LEN);
 	writel(bits_per_char, uport->membase + SE_UART_RX_WORD_LEN);
 	writel(stop_bit_len, uport->membase + SE_UART_TX_STOP_BIT_LEN);
+
+	trace_geni_serial_set_termios(uport->dev, baud, bits_per_char,
+				      tx_trans_cfg, tx_parity_cfg, rx_trans_cfg,
+				      rx_parity_cfg, stop_bit_len);
 }
 
 #ifdef CONFIG_SERIAL_QCOM_GENI_CONSOLE

-- 
2.34.1


^ permalink raw reply related

* [PATCH v5 1/2] serial: qcom-geni: trace: Drop redundant len field from geni_serial_data
From: Praveen Talari @ 2026-06-15 14:16 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Greg Kroah-Hartman, Jiri Slaby, konrad.dybcio
  Cc: Praveen Talari, linux-kernel, linux-trace-kernel, linux-arm-msm,
	linux-serial, mukesh.savaliya, aniket.randive, chandana.chiluveru
In-Reply-To: <20260615-add-tracepoints-for-qcom-geni-serial-v5-0-2efa4c97e0e2@oss.qualcomm.com>

The dynamic array stored in the ring buffer already carries its own
length in the array metadata. There is no need to also store it as a
separate scalar field in the entry struct.

Drop __field(unsigned int, len) and the corresponding __entry->len
assignment, and use __get_dynamic_array_len(data) in the TP_printk for
both the len=%u format argument and the __print_hex() size argument.
This saves 4 bytes per event on the ring buffer.

Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
---
 include/trace/events/qcom_geni_serial.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/qcom_geni_serial.h b/include/trace/events/qcom_geni_serial.h
index 417ec01f9fc8..e1aa551d525e 100644
--- a/include/trace/events/qcom_geni_serial.h
+++ b/include/trace/events/qcom_geni_serial.h
@@ -97,18 +97,17 @@ DECLARE_EVENT_CLASS(geni_serial_data,
 		    TP_ARGS(dev, buf, len),
 
 		    TP_STRUCT__entry(__string(name, dev_name(dev))
-				     __field(unsigned int, len)
 				     __dynamic_array(u8, data, len)
 		    ),
 
 		    TP_fast_assign(__assign_str(name);
-				   __entry->len = len;
 				   memcpy(__get_dynamic_array(data), buf, len);
 		    ),
 
 		    TP_printk("%s: len=%u data=%s",
-			      __get_str(name), __entry->len,
-			      __print_hex(__get_dynamic_array(data), __entry->len))
+			      __get_str(name), __get_dynamic_array_len(data),
+			      __print_hex(__get_dynamic_array(data),
+					  __get_dynamic_array_len(data)))
 );
 
 DEFINE_EVENT(geni_serial_data, geni_serial_tx_data,

-- 
2.34.1


^ permalink raw reply related

* [PATCH v5 0/2] Add tracepoints support for Qualcomm GENI Serial drivers
From: Praveen Talari @ 2026-06-15 14:16 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Greg Kroah-Hartman, Jiri Slaby, konrad.dybcio
  Cc: Praveen Talari, linux-kernel, linux-trace-kernel, linux-arm-msm,
	linux-serial, mukesh.savaliya, aniket.randive, chandana.chiluveru

Add tracepoints to the Qualcomm GENI (Generic Interface) serial driver.
These trace events enable runtime debugging and performance analysis of
UART operations.

The trace events cover UART termios configuration, clock setup, manual
control state, interrupt status, and actual transmitted/received data in
hexadecimal format.

Usage examples:

Enable all serial traces:
  echo 1 > /sys/kernel/debug/tracing/events/qcom_geni_serial/enable
  cat /sys/kernel/debug/tracing/trace_pipe

Example trace output:

2517.938432: geni_serial_clk_cfg: a94000.serial: desired_rate=1843200
     clk_rate=7372800 clk_div=4 clk_idx=0
2517.938753: geni_serial_irq: a94000.serial: m_irq=0x88800000
     s_irq=0x08000111 dma_tx=0x00000000 dma_rx=0x00000000
2517.938803: geni_serial_set_termios: a94000.serial: baud=115200 bpc=8
     tx_trans=0x00000002 tx_par=0x00000000 rx_trans=0x00000000
rx_par=0x00000000 stop=0
2517.938807: geni_serial_set_mctrl: a94000.serial: mctrl=0x8006
     uart_manual_rfr=0x00000000
2517.938818: geni_serial_get_mctrl: a94000.serial: mctrl=0x0160
     geni_ios=0x00000001
2517.939165: geni_serial_irq: a94000.serial: m_irq=0x00400000
     s_irq=0x00000000 dma_tx=0x00000000 dma_rx=0x00000000
2517.939592: geni_serial_tx_data: a94000.serial: tx_len=8 data=61 62 63
     64 65 66 67 68
2517.940610: geni_serial_irq: a94000.serial: m_irq=0x00000001
     s_irq=0x00000000 dma_tx=0x00000003 dma_rx=0x00000000
2517.942174: geni_serial_irq: a94000.serial: m_irq=0x08000000
     s_irq=0x08000100 dma_tx=0x00000000 dma_rx=0x00000003
2517.942323: geni_serial_rx_data: a94000.serial: rx_len=8 data=61 62 63
     64 65 66 67 68
2517.942680: geni_serial_set_mctrl: a94000.serial: mctrl=0x8000
     uart_manual_rfr=0x80000002

Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
---
Changes in v5:
- Remove serial trace header patch since it was accepted and merged.
- Added a new patch to fix for remove unwanted variable len.
- Link to v4: https://lore.kernel.org/all/20260526-add-tracepoints-for-qcom-geni-serial-v4-0-e94fbaec0232@oss.qualcomm.com

Changes in v4:
- Rebased patch(02/02) on latest linux-next.
- Link to v3: https://lore.kernel.org/all/20260518-add-tracepoints-for-qcom-geni-serial-v3-0-b4addb151376@oss.qualcomm.com

Changes in v3:
- Removed \n from geni_serial_tx_data and geni_serial_rx_data events.
- Resolved aligment issues in geni_serial_data, geni_serial_tx_data and
  geni_serial_rx_data events.
- Link to v2: https://lore.kernel.org/r/20260512-add-tracepoints-for-qcom-geni-serial-v2-0-a5726421b3af@oss.qualcomm.com

Changes in v2:
- removed multiple trace events for TX/RX events, instead used
  DECLARE_EVENT_CLASS and DEFINE_EVENT.
- Link to v1: https://lore.kernel.org/r/20260506-add-tracepoints-for-qcom-geni-serial-v1-0-544b22612e08@oss.qualcomm.com

To: Steven Rostedt <rostedt@goodmis.org>
To: Masami Hiramatsu <mhiramat@kernel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Jiri Slaby <jirislaby@kernel.org>
To: konrad.dybcio@oss.qualcomm.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-kernel@vger.kernel.org
Cc: linux-arm-msm@vger.kernel.org
Cc: linux-serial@vger.kernel.org
Cc: mukesh.savaliya@oss.qualcomm.com
Cc: aniket.randive@oss.qualcomm.com
Cc: chandana.chiluveru@oss.qualcomm.com

---
Praveen Talari (2):
      serial: qcom-geni: trace: Drop redundant len field from geni_serial_data
      serial: qcom-geni: Add tracepoints for Qualcomm GENI serial driver

 drivers/tty/serial/qcom_geni_serial.c   | 27 +++++++++++++++++++++++----
 include/trace/events/qcom_geni_serial.h |  7 +++----
 2 files changed, 26 insertions(+), 8 deletions(-)
---
base-commit: c425609d6ac4012c8bbf01ec2e10e801b1923a7b
change-id: 20260427-add-tracepoints-for-qcom-geni-serial-948777218b7b

Best regards,
--  
Praveen Talari <praveen.talari@oss.qualcomm.com>


^ permalink raw reply

* Re: [PATCH v4 1/2] serial: qcom-geni: trace: Add tracepoint support for Qualcomm GENI serial
From: Greg Kroah-Hartman @ 2026-06-15 13:31 UTC (permalink / raw)
  To: Praveen Talari
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Jiri Slaby,
	konrad.dybcio, linux-kernel, linux-trace-kernel, linux-arm-msm,
	linux-serial, mukesh.savaliya, aniket.randive, chandana.chiluveru
In-Reply-To: <945821ea-5065-4e20-a1f9-32f7c9adb66a@oss.qualcomm.com>

On Mon, Jun 15, 2026 at 06:09:17PM +0530, Praveen Talari wrote:
> Hi
> 
> On 15-06-2026 16:12, Greg Kroah-Hartman wrote:
> > On Mon, Jun 15, 2026 at 03:26:25PM +0530, Praveen Talari wrote:
> > > HI Steven,
> > > 
> > > On 29-05-2026 19:44, Steven Rostedt wrote:
> > > > On Tue, 26 May 2026 23:07:39 +0530
> > > > Praveen Talari <praveen.talari@oss.qualcomm.com> wrote:
> > > > 
> > > > > +DECLARE_EVENT_CLASS(geni_serial_data,
> > > > > +		    TP_PROTO(struct device *dev, const u8 *buf, unsigned int len),
> > > > > +		    TP_ARGS(dev, buf, len),
> > > > > +
> > > > > +		    TP_STRUCT__entry(__string(name, dev_name(dev))
> > > > > +				     __field(unsigned int, len)
> > > > > +				     __dynamic_array(u8, data, len)
> > > > > +		    ),
> > > > > +
> > > > > +		    TP_fast_assign(__assign_str(name);
> > > > > +				   __entry->len = len;
> > > > > +				   memcpy(__get_dynamic_array(data), buf, len);
> > > > > +		    ),
> > > > > +
> > > > > +		    TP_printk("%s: len=%u data=%s",
> > > > > +			      __get_str(name), __entry->len,
> > > > > +			      __print_hex(__get_dynamic_array(data), __entry->len))
> > > > > +);
> > > > No need to save the length of the dynamic array in __entry->len because
> > > > it's already saved in the metadata of the dynamic array that is stored
> > > > on the buffer. Instead you can have:
> > > > 
> > > > DECLARE_EVENT_CLASS(geni_serial_data,
> > > > 		    TP_PROTO(struct device *dev, const u8 *buf, unsigned int len),
> > > > 		    TP_ARGS(dev, buf, len),
> > > > 
> > > > 		    TP_STRUCT__entry(__string(name, dev_name(dev))
> > > > 				     __dynamic_array(u8, data, len)
> > > > 		    ),
> > > > 
> > > > 		    TP_fast_assign(__assign_str(name);
> > > > 				   memcpy(__get_dynamic_array(data), buf, len);
> > > > 		    ),
> > > > 
> > > > 		    TP_printk("%s: len=%u data=%s",
> > > > 			      __get_str(name), __entry->len,
> > > > 			      __print_hex(__get_dynamic_array(data),
> > > > 					__get_dynamic_array_len(data)))
> > > > );
> > > > 
> > > > That will save you 4 bytes per event on the ring buffer. And a few
> > > > cycles not having to store the redundant information.
> > > This patch has already been accepted and is available in linux-next.
> > Great, can you send a fixup for it?  Or want me to revert this instead?
> 
> can i add fix patch in same series by removing original patch(0/1)?

We can never rewrite a public git tree.  So either revert and redo it as
a new patch, or send a fix for this one, your choice.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v4] mm/lruvec: trace LRU add drains and drain-all requests
From: Vlastimil Babka (SUSE) @ 2026-06-15 12:48 UTC (permalink / raw)
  To: JP Kobryn, linux-mm, willy, shakeel.butt, usama.arif, akpm,
	mhocko, rostedt, mhiramat, mathieu.desnoyers, kasong, qi.zheng,
	baohua, axelrasmussen, yuanchu, weixugc, chrisl, shikemeng,
	nphamcs, baoquan.he, youngjun.park
  Cc: linux-kernel, linux-trace-kernel
In-Reply-To: <20260610234808.212397-1-jp.kobryn@linux.dev>

On 6/11/26 01:48, JP Kobryn wrote:
> LRU add batches can be drained before they reach capacity. This can be a
> source of LRU lock contention, but it is not currently possible to
> attribute these drains to callers with existing tracepoints.
> 
> Add mm_lru_add_drain to report the CPU and lru_add batch count when an
> lru_add batch is drained. This allows tracing to distinguish full drains
> from partial drains and attribute them to the calling stack.
> 
> Add mm_lru_add_drain_all to capture callers of __lru_add_drain_all and
> whether they set the force flag for all CPUs. The tracepoint resembles
> the signature of the enclosing function, but is needed because of
> potential inlining.
> 
> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
> Reviewed-by: Barry Song <baohua@kernel.org>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>

> ---
> v4:
>   - renamed nr_folio_add to nr_folios in lru_add_drain()
>   - renamed nr to nr_folios in tracepoint for consistency
> 
> v3: https://lore.kernel.org/linux-mm/20260610195220.12403-1-jp.kobryn@linux.dev/
>   - restored and renamed tracepoint in __lru_add_drain_all
> 
> v2: https://lore.kernel.org/linux-mm/20260609041156.31127-1-jp.kobryn@linux.dev/
>   - removed mm_lru_drain_all tracepoint
> 
> v1: https://lore.kernel.org/linux-mm/20260609041156.31127-1-jp.kobryn@linux.dev/
> 
>  include/trace/events/pagemap.h | 37 ++++++++++++++++++++++++++++++++++
>  mm/swap.c                      |  7 ++++++-
>  2 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/include/trace/events/pagemap.h b/include/trace/events/pagemap.h
> index 171524d3526d..df6ac4d13dcf 100644
> --- a/include/trace/events/pagemap.h
> +++ b/include/trace/events/pagemap.h
> @@ -77,6 +77,43 @@ TRACE_EVENT(mm_lru_activate,
>  	TP_printk("folio=%p pfn=0x%lx", __entry->folio, __entry->pfn)
>  );
>  
> +TRACE_EVENT(mm_lru_add_drain,
> +
> +	TP_PROTO(int cpu, unsigned int nr_folios),
> +
> +	TP_ARGS(cpu, nr_folios),
> +
> +	TP_STRUCT__entry(
> +		__field(int,		cpu	)
> +		__field(unsigned int,	nr_folios	)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->cpu	= cpu;
> +		__entry->nr_folios	= nr_folios;
> +	),
> +
> +	TP_printk("cpu=%d nr_folios=%u", __entry->cpu, __entry->nr_folios)
> +);
> +
> +TRACE_EVENT(mm_lru_add_drain_all,
> +
> +	TP_PROTO(bool force_all_cpus),
> +
> +	TP_ARGS(force_all_cpus),
> +
> +	TP_STRUCT__entry(
> +		__field(bool,	force_all_cpus	)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->force_all_cpus	= force_all_cpus;
> +	),
> +
> +	TP_printk("force_all_cpus=%s",
> +		__entry->force_all_cpus ? "true" : "false")
> +);
> +
>  #endif /* _TRACE_PAGEMAP_H */
>  
>  /* This part must be outside protection */
> diff --git a/mm/swap.c b/mm/swap.c
> index 588f50d8f1a8..b506fa912a93 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -694,9 +694,12 @@ void lru_add_drain_cpu(int cpu)
>  {
>  	struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
>  	struct folio_batch *fbatch = &fbatches->lru_add;
> +	unsigned int nr_folios = folio_batch_count(fbatch);
>  
> -	if (folio_batch_count(fbatch))
> +	if (nr_folios) {
>  		folio_batch_move_lru(fbatch, lru_add);
> +		trace_mm_lru_add_drain(cpu, nr_folios);
> +	}
>  
>  	fbatch = &fbatches->lru_move_tail;
>  	/* Disabling interrupts below acts as a compiler barrier. */
> @@ -869,6 +872,8 @@ static inline void __lru_add_drain_all(bool force_all_cpus)
>  	if (WARN_ON(!mm_percpu_wq))
>  		return;
>  
> +	trace_mm_lru_add_drain_all(force_all_cpus);
> +
>  	/*
>  	 * Guarantee folio_batch counter stores visible by this CPU
>  	 * are visible to other CPUs before loading the current drain


^ permalink raw reply

* Re: [PATCH v4 1/2] serial: qcom-geni: trace: Add tracepoint support for Qualcomm GENI serial
From: Praveen Talari @ 2026-06-15 12:39 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Jiri Slaby,
	konrad.dybcio, linux-kernel, linux-trace-kernel, linux-arm-msm,
	linux-serial, mukesh.savaliya, aniket.randive, chandana.chiluveru
In-Reply-To: <2026061536-skirmish-nuptials-8fd6@gregkh>

Hi

On 15-06-2026 16:12, Greg Kroah-Hartman wrote:
> On Mon, Jun 15, 2026 at 03:26:25PM +0530, Praveen Talari wrote:
>> HI Steven,
>>
>> On 29-05-2026 19:44, Steven Rostedt wrote:
>>> On Tue, 26 May 2026 23:07:39 +0530
>>> Praveen Talari <praveen.talari@oss.qualcomm.com> wrote:
>>>
>>>> +DECLARE_EVENT_CLASS(geni_serial_data,
>>>> +		    TP_PROTO(struct device *dev, const u8 *buf, unsigned int len),
>>>> +		    TP_ARGS(dev, buf, len),
>>>> +
>>>> +		    TP_STRUCT__entry(__string(name, dev_name(dev))
>>>> +				     __field(unsigned int, len)
>>>> +				     __dynamic_array(u8, data, len)
>>>> +		    ),
>>>> +
>>>> +		    TP_fast_assign(__assign_str(name);
>>>> +				   __entry->len = len;
>>>> +				   memcpy(__get_dynamic_array(data), buf, len);
>>>> +		    ),
>>>> +
>>>> +		    TP_printk("%s: len=%u data=%s",
>>>> +			      __get_str(name), __entry->len,
>>>> +			      __print_hex(__get_dynamic_array(data), __entry->len))
>>>> +);
>>> No need to save the length of the dynamic array in __entry->len because
>>> it's already saved in the metadata of the dynamic array that is stored
>>> on the buffer. Instead you can have:
>>>
>>> DECLARE_EVENT_CLASS(geni_serial_data,
>>> 		    TP_PROTO(struct device *dev, const u8 *buf, unsigned int len),
>>> 		    TP_ARGS(dev, buf, len),
>>>
>>> 		    TP_STRUCT__entry(__string(name, dev_name(dev))
>>> 				     __dynamic_array(u8, data, len)
>>> 		    ),
>>>
>>> 		    TP_fast_assign(__assign_str(name);
>>> 				   memcpy(__get_dynamic_array(data), buf, len);
>>> 		    ),
>>>
>>> 		    TP_printk("%s: len=%u data=%s",
>>> 			      __get_str(name), __entry->len,
>>> 			      __print_hex(__get_dynamic_array(data),
>>> 					__get_dynamic_array_len(data)))
>>> );
>>>
>>> That will save you 4 bytes per event on the ring buffer. And a few
>>> cycles not having to store the redundant information.
>> This patch has already been accepted and is available in linux-next.
> Great, can you send a fixup for it?  Or want me to revert this instead?

can i add fix patch in same series by removing original patch(0/1)?


Thanks,

Praveen Talari

>
> thanks,
>
> greg k-h

^ permalink raw reply

* Re: [PATCH v4 1/2] serial: qcom-geni: trace: Add tracepoint support for Qualcomm GENI serial
From: Greg Kroah-Hartman @ 2026-06-15 10:42 UTC (permalink / raw)
  To: Praveen Talari
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Jiri Slaby,
	konrad.dybcio, linux-kernel, linux-trace-kernel, linux-arm-msm,
	linux-serial, mukesh.savaliya, aniket.randive, chandana.chiluveru
In-Reply-To: <688f0529-44ea-4cdf-bb0f-6c42cb3fa07e@oss.qualcomm.com>

On Mon, Jun 15, 2026 at 03:26:25PM +0530, Praveen Talari wrote:
> HI Steven,
> 
> On 29-05-2026 19:44, Steven Rostedt wrote:
> > On Tue, 26 May 2026 23:07:39 +0530
> > Praveen Talari <praveen.talari@oss.qualcomm.com> wrote:
> > 
> > > +DECLARE_EVENT_CLASS(geni_serial_data,
> > > +		    TP_PROTO(struct device *dev, const u8 *buf, unsigned int len),
> > > +		    TP_ARGS(dev, buf, len),
> > > +
> > > +		    TP_STRUCT__entry(__string(name, dev_name(dev))
> > > +				     __field(unsigned int, len)
> > > +				     __dynamic_array(u8, data, len)
> > > +		    ),
> > > +
> > > +		    TP_fast_assign(__assign_str(name);
> > > +				   __entry->len = len;
> > > +				   memcpy(__get_dynamic_array(data), buf, len);
> > > +		    ),
> > > +
> > > +		    TP_printk("%s: len=%u data=%s",
> > > +			      __get_str(name), __entry->len,
> > > +			      __print_hex(__get_dynamic_array(data), __entry->len))
> > > +);
> > No need to save the length of the dynamic array in __entry->len because
> > it's already saved in the metadata of the dynamic array that is stored
> > on the buffer. Instead you can have:
> > 
> > DECLARE_EVENT_CLASS(geni_serial_data,
> > 		    TP_PROTO(struct device *dev, const u8 *buf, unsigned int len),
> > 		    TP_ARGS(dev, buf, len),
> > 
> > 		    TP_STRUCT__entry(__string(name, dev_name(dev))
> > 				     __dynamic_array(u8, data, len)
> > 		    ),
> > 
> > 		    TP_fast_assign(__assign_str(name);
> > 				   memcpy(__get_dynamic_array(data), buf, len);
> > 		    ),
> > 
> > 		    TP_printk("%s: len=%u data=%s",
> > 			      __get_str(name), __entry->len,
> > 			      __print_hex(__get_dynamic_array(data),
> > 					__get_dynamic_array_len(data)))
> > );
> > 
> > That will save you 4 bytes per event on the ring buffer. And a few
> > cycles not having to store the redundant information.
> 
> This patch has already been accepted and is available in linux-next.

Great, can you send a fixup for it?  Or want me to revert this instead?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v3 5/9] rv/ha: make da_monitor_reset_hook and EVENT_NONE_LBL overridable
From: Gabriele Monaco @ 2026-06-15 10:16 UTC (permalink / raw)
  To: wen.yang; +Cc: Steven Rostedt, linux-trace-kernel, linux-kernel
In-Reply-To: <13a25b73c9fdebd26c2d4f922a83408dbcfc214d.1780847473.git.wen.yang@linux.dev>

On Mon, 2026-06-08 at 00:13 +0800, wen.yang@linux.dev wrote:
> From: Wen Yang <wen.yang@linux.dev>
> 
> Wrap the two definitions with #ifndef guards so that HA-based
> monitors
> can substitute their own implementations before including this
> header:
> 
>   /* in monitor.c, before #include <rv/ha_monitor.h> */
>   #define da_monitor_reset_hook  my_monitor_reset_env
>   #define EVENT_NONE_LBL         "idle"
> 
> No behaviour change for monitors that do not override either macro.
> 
> Signed-off-by: Wen Yang <wen.yang@linux.dev>
> ---
>  include/rv/ha_monitor.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/rv/ha_monitor.h b/include/rv/ha_monitor.h
> index e5860900a337..610da54c111f 100644
> --- a/include/rv/ha_monitor.h
> +++ b/include/rv/ha_monitor.h
> @@ -36,7 +36,10 @@ static bool ha_monitor_handle_constraint(struct
> da_monitor *da_mon,
>  					 da_id_type id);
>  #define da_monitor_event_hook ha_monitor_handle_constraint
>  #define da_monitor_init_hook ha_monitor_init_env
> +/* Allow monitors to override da_monitor_reset_hook before including
> this header. */

Just a nit: users can do that but shouldn't do it mindlessly, so let's
add a line like:

  "Make sure you still call ha_monitor_reset_env() or reset timers
otherwise."

Other than that looks good

Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>

Thanks,
Gabriele

> +#ifndef da_monitor_reset_hook
>  #define da_monitor_reset_hook ha_monitor_reset_env
> +#endif
>  #define da_monitor_sync_hook() synchronize_rcu()
>  
>  #if !defined(HA_SKIP_AUTO_CLEANUP) && RV_MON_TYPE == RV_MON_PER_TASK
> @@ -75,7 +78,9 @@ _Static_assert(offsetof(struct ha_monitor, da_mon)
> == 0,
>  #define ENV_INVALID_VALUE U64_MAX
>  /* Error with no event occurs only on timeouts */
>  #define EVENT_NONE EVENT_MAX
> +#ifndef EVENT_NONE_LBL
>  #define EVENT_NONE_LBL "none"
> +#endif
>  #define ENV_BUFFER_SIZE 64
>  
>  #ifdef CONFIG_RV_REACTORS


^ permalink raw reply

* Re: [PATCH v3 4/9] rv/ha: fix ha_invariant_passed_ns silent bypass of invariant check
From: Gabriele Monaco @ 2026-06-15 10:12 UTC (permalink / raw)
  To: wen.yang; +Cc: Steven Rostedt, linux-trace-kernel, linux-kernel
In-Reply-To: <812b7b8e8979b4ab00ac7727e3fea578799f2a8b.1780847473.git.wen.yang@linux.dev>

On Mon, 2026-06-08 at 00:13 +0800, wen.yang@linux.dev wrote:
> From: Wen Yang <wen.yang@linux.dev>
> 
> The function is documented as "prepare the invariant and return the
> time
> since reset", but on the first call (env_store == U64_MAX) it exits
> early without calling ha_set_invariant_ns():
> 
>   if (ha_monitor_env_invalid(ha_mon, env))  /* env_store == U64_MAX
> */
>       return 0;   /* ha_set_invariant_ns skipped, env_store stays
> U64_MAX */
>   ...
>   ha_set_invariant_ns(ha_mon, env, expire - passed, time_ns);
> 
> This leaves env_store == U64_MAX, so ha_check_invariant_ns() always
> passes on the first activation regardless of elapsed time:
> 
>   return READ_ONCE(ha_mon->env_store[env]) >= time_ns;  /* U64_MAX >=
> any */
> 
> Fix: establish the guard before converting to the invariant:
> 
>   if (ha_monitor_env_invalid(ha_mon, env))
>       ha_reset_clk_ns(ha_mon, env, time_ns); /* guard: env_store =
> time_ns */
>   passed = ha_get_env(ha_mon, env, time_ns);
>   ha_set_invariant_ns(ha_mon, env, expire - passed, time_ns);
>                                  /* invariant: env_store = time_ns +
> expire */
> 
> Apply the same fix to ha_invariant_passed_jiffy().
> 
> Signed-off-by: Wen Yang <wen.yang@linux.dev>

This is neat and probably works just fine in your case. Anyway I'm
planning on a more seamless solution not involving invalid states at
all. Initialisation would include a reset, which should work better on
monitors doing da_handle_start(), so not running the first event.
That's, by the way, pretty much what you wanted by doing reset() on the
(virtual) init transition.

We first needs Nam's simplification though [1]. We can synchronise
during this development cycle, you probably won't really need to bother
for now.

Thanks,
Gabriele

[1] -
https://lore.kernel.org/lkml/08188c28f274da63a3f8549add3086a92aef45e5.1780908661.git.namcao@linutronix.de

> ---
>  include/rv/ha_monitor.h | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/include/rv/ha_monitor.h b/include/rv/ha_monitor.h
> index 28d3c74cabfc..e5860900a337 100644
> --- a/include/rv/ha_monitor.h
> +++ b/include/rv/ha_monitor.h
> @@ -365,16 +365,22 @@ static inline bool ha_check_invariant_ns(struct
> ha_monitor *ha_mon,
>  }
>  /*
>   * ha_invariant_passed_ns - prepare the invariant and return the
> time since reset
> + *
> + * If the env has not been initialised yet (first entry into a state
> with an
> + * invariant), anchor the guard clock at the current time so that
> the full
> + * budget is available from this point.  This preserves the
> documented
> + * guard→invariant ordering: ha_set_invariant_ns() is always
> preceded by a
> + * valid guard representation in env_store.
>   */
>  static inline u64 ha_invariant_passed_ns(struct ha_monitor *ha_mon,
> enum envs env,
>  				   u64 expire, u64 time_ns)
>  {
> -	u64 passed = 0;
> +	u64 passed;
>  
>  	if (env < 0 || env >= ENV_MAX_STORED)
>  		return 0;
>  	if (ha_monitor_env_invalid(ha_mon, env))
> -		return 0;
> +		ha_reset_clk_ns(ha_mon, env, time_ns);
>  	passed = ha_get_env(ha_mon, env, time_ns);
>  	ha_set_invariant_ns(ha_mon, env, expire - passed, time_ns);
>  	return passed;
> @@ -404,16 +410,19 @@ static inline bool
> ha_check_invariant_jiffy(struct ha_monitor *ha_mon,
>  }
>  /*
>   * ha_invariant_passed_jiffy - prepare the invariant and return the
> time since reset
> + *
> + * Same first-use semantics as ha_invariant_passed_ns(): anchor the
> guard clock
> + * now if the env has not been initialised.
>   */
>  static inline u64 ha_invariant_passed_jiffy(struct ha_monitor
> *ha_mon, enum envs env,
>  				      u64 expire, u64 time_ns)
>  {
> -	u64 passed = 0;
> +	u64 passed;
>  
>  	if (env < 0 || env >= ENV_MAX_STORED)
>  		return 0;
>  	if (ha_monitor_env_invalid(ha_mon, env))
> -		return 0;
> +		ha_reset_clk_jiffy(ha_mon, env);
>  	passed = ha_get_env(ha_mon, env, time_ns);
>  	ha_set_invariant_jiffy(ha_mon, env, expire - passed);
>  	return passed;


^ permalink raw reply

* Re: [PATCH v3 1/9] rv/da: introduce DA_MON_ALLOCATION_STRATEGY
From: Gabriele Monaco @ 2026-06-15  9:56 UTC (permalink / raw)
  To: wen.yang; +Cc: Steven Rostedt, linux-trace-kernel, linux-kernel
In-Reply-To: <496394879a590b4d7bafdb2f13618d2e30be982f.1780847473.git.wen.yang@linux.dev>

On Mon, 2026-06-08 at 00:13 +0800, wen.yang@linux.dev wrote:

> +#ifndef DA_MON_ALLOCATION_STRATEGY
> +# define DA_MON_ALLOCATION_STRATEGY DA_ALLOC_AUTO
> +#endif

I'm not sure the space goes there according to kernel coding style, we
don't use it in this file and clang-format removes it. I'd keep
consistency.

...

>  
> +/*
> + * DA_MON_POOL_SIZE must be defined before this header is included

I don't think we need to be this verbose (also, ha_monitor may not even
be included if that's a da_monitor). I would stop at the line above.

> (directly or
> + * transitively via ha_monitor.h) when DA_ALLOC_POOL is selected. 
> In practice
> + * this means defining it after the monitor's model header (which
> supplies the
> + * capacity constant) and before the ha_monitor.h include.
> + */

> +#if DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL &&
> !defined(DA_MON_POOL_SIZE)
> +# error "DA_ALLOC_POOL requires DA_MON_POOL_SIZE to be defined
> before including this header"

Same here I'd keep consistency and remove the space before error.

> +#endif

...
  
> +/*
> + * Per-object pool state.
> + *
> + * Zero-initialised by default (storage == NULL ⟹ kmalloc mode).  A

Mmh, ⟹  doesn't seem to print that well on my terminal, let's perhaps
use plain old ASCII => . Also I don't find what you put in parentheses
to be adding much value, we could even omit it.

I remember discussing about this so I may have missed your answer, but
why don't we handle this pool as a simple kmem_cache/mempool instead of
implementing a similar logic from scratch?

> monitor
> + * opts into pool mode by defining DA_MON_ALLOCATION_STRATEGY
> DA_ALLOC_POOL
> + * and DA_MON_POOL_SIZE before including this header;
> da_monitor_init() then
> + * pre-allocates the pool internally.
> + *
> + * Because every field is wrapped in this struct and the struct
> itself is a
> + * per-TU static, each monitor that includes this header gets a
> completely
> + * independent pool.  A kmalloc monitor (e.g. nomiss) and a pool
> monitor
> + * (e.g. tlob) therefore coexist without any interference.
> + *
> + * da_pool_return_cb runs from softirq (non-PREEMPT_RT) or rcuc
> kthread
> + * (PREEMPT_RT); spin_lock_irqsave handles both.
> + */
> +struct da_per_obj_pool {
> +	struct da_monitor_storage  *storage;  /* non-NULL ⟹ pool
> mode */
> +	struct da_monitor_storage **free;     /* kmalloc'd pointer
> stack */
> +	unsigned int                free_top;
> +	unsigned int                capacity; /* total number of
> slots */
> +	spinlock_t                  lock;
> +};
> +
> +static struct da_per_obj_pool da_pool = {
> +	.lock = __SPIN_LOCK_UNLOCKED(da_pool.lock),
> +};

...

>  /*
>   * da_destroy_storage - destroy the per-object storage
>   *
> - * The caller is responsible to synchronise writers, either with
> locks or
> - * implicitly. For instance, if da_destroy_storage is called at
> sched_exit and
> - * da_create_storage can never occur after that, it's safe to call
> this without
> - * locks.
> - * This function includes an RCU read-side critical section to
> synchronise
> - * against da_monitor_destroy().
> + * Pool mode: removes from hash and returns the slot via call_rcu().
> + * Kmalloc mode: removes from hash and frees via kfree_rcu().
> + *
> + * Includes an RCU read-side critical section to synchronise against
> + * da_monitor_destroy().
>   */
>  static inline void da_destroy_storage(da_id_type id)
>  {
> @@ -558,7 +670,11 @@ static inline void da_destroy_storage(da_id_type
> id)
>  		return;
>  	da_monitor_reset_hook(&mon_storage->rv.da_mon);
>  	hash_del_rcu(&mon_storage->node);
> +#if DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL
> +	call_rcu(&mon_storage->rcu, da_pool_return_cb);
> +#else

ifdeffery in functions is discouraged as quite unreadable. Since
DA_MON_ALLOCATION_STRATEGY is guaranteed to be defined, simple C ifs are
going to be mostly equivalent (the compiler will cut instead of the
preprocessor, but still).

>  	kfree_rcu(mon_storage, rcu);
> +#endif
>  }

...

> +
> +/*
> + * da_monitor_init - initialise the per-object monitor
> + *
> + * Selects the allocation path at compile time based on
> DA_MON_ALLOCATION_STRATEGY:
> + *   DA_ALLOC_POOL   - pre-allocates DA_MON_POOL_SIZE storage slots.
> + *   DA_ALLOC_AUTO / DA_ALLOC_MANUAL - initialises the hash table
> only.
> + */
>  static inline int da_monitor_init(void)
>  {
>  	hash_init(da_monitor_ht);
> +#if DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL
> +	return __da_monitor_init_pool(DA_MON_POOL_SIZE);
> +#else

Same here, use if()

>  	return 0;
> +#endif
>  }
>  
> -static inline void da_monitor_destroy(void)
> +static inline void da_monitor_destroy_pool(void)
> +{
> +	struct da_monitor_storage *ms;
> +	struct hlist_node *tmp;
> +	int bkt;
> +
> +	/*
> +	 * Ensure all in-flight tracepoint handlers that may hold a
> raw pointer
> +	 * to a pool slot (e.g. tlob_stop_task after its RCU guard
> exits) have
> +	 * completed before we begin tearing down the pool.  Mirrors
> the same
> +	 * call in da_monitor_destroy_kmalloc().
> +	 */
> +	tracepoint_synchronize_unregister();
> +

This is common between the pool and kmalloc flavours, you can leave it
in da_monitor_destroy() before branching.

> +	/*
> +	 * Drain any entries that were not stopped before destroy
> (e.g.
> +	 * uprobe-started sessions whose stop probe never fired). 
> Call
> +	 * da_extra_cleanup() before hash_del_rcu() so the hook may
> safely
> +	 * call ha_cancel_timer_sync() while the monitor is still
> reachable.
> +	 */
> +	hash_for_each_safe(da_monitor_ht, bkt, tmp, ms, node) {
> +		da_extra_cleanup(&ms->rv.da_mon);
> +		hash_del_rcu(&ms->node);
> +		call_rcu(&ms->rcu, da_pool_return_cb);
> +	}

Cannot you make also this common? da_extra_cleanup() should be called in
all flavours and the only difference I see here is the rcu callback.

Also do you really need call_rcu() ? Since we should already have waited
for a grace period, you can probably call the function directly. If not,
I'd still try and make both flavours consistent (sync + free OR call_rcu
+ barrier, not both).

> +
> +	/*
> +	 * rcu_barrier() drains every pending call_rcu() callback,
> including
> +	 * both da_pool_return_cb() and any monitor-specific free
> callbacks
> +	 * (e.g. tlob_free_rcu) enqueued by da_extra_cleanup().
> +	 */
> +	rcu_barrier();
> +	kfree(da_pool.storage);
> +	da_pool.storage = NULL;
> +	kfree(da_pool.free);
> +	da_pool.free = NULL;
> +	da_pool.free_top = 0;
> +	da_pool.capacity = 0;

Only this part is really specific to this allocation flavour, if you
want it in a separate function, go ahead, but the rest should probably
share as much code as possible (especially the cleanup/synchronisation
mess we just worked out).

Thanks,
Gabriele


^ permalink raw reply

* Re: [PATCH v4 1/2] serial: qcom-geni: trace: Add tracepoint support for Qualcomm GENI serial
From: Praveen Talari @ 2026-06-15  9:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Greg Kroah-Hartman,
	Jiri Slaby, konrad.dybcio, linux-kernel, linux-trace-kernel,
	linux-arm-msm, linux-serial, mukesh.savaliya, aniket.randive,
	chandana.chiluveru
In-Reply-To: <20260529101422.18dda2ae@fedora>

HI Steven,

On 29-05-2026 19:44, Steven Rostedt wrote:
> On Tue, 26 May 2026 23:07:39 +0530
> Praveen Talari <praveen.talari@oss.qualcomm.com> wrote:
>
>> +DECLARE_EVENT_CLASS(geni_serial_data,
>> +		    TP_PROTO(struct device *dev, const u8 *buf, unsigned int len),
>> +		    TP_ARGS(dev, buf, len),
>> +
>> +		    TP_STRUCT__entry(__string(name, dev_name(dev))
>> +				     __field(unsigned int, len)
>> +				     __dynamic_array(u8, data, len)
>> +		    ),
>> +
>> +		    TP_fast_assign(__assign_str(name);
>> +				   __entry->len = len;
>> +				   memcpy(__get_dynamic_array(data), buf, len);
>> +		    ),
>> +
>> +		    TP_printk("%s: len=%u data=%s",
>> +			      __get_str(name), __entry->len,
>> +			      __print_hex(__get_dynamic_array(data), __entry->len))
>> +);
> No need to save the length of the dynamic array in __entry->len because
> it's already saved in the metadata of the dynamic array that is stored
> on the buffer. Instead you can have:
>
> DECLARE_EVENT_CLASS(geni_serial_data,
> 		    TP_PROTO(struct device *dev, const u8 *buf, unsigned int len),
> 		    TP_ARGS(dev, buf, len),
>
> 		    TP_STRUCT__entry(__string(name, dev_name(dev))
> 				     __dynamic_array(u8, data, len)
> 		    ),
>
> 		    TP_fast_assign(__assign_str(name);
> 				   memcpy(__get_dynamic_array(data), buf, len);
> 		    ),
>
> 		    TP_printk("%s: len=%u data=%s",
> 			      __get_str(name), __entry->len,
> 			      __print_hex(__get_dynamic_array(data),
> 					__get_dynamic_array_len(data)))
> );
>
> That will save you 4 bytes per event on the ring buffer. And a few
> cycles not having to store the redundant information.

This patch has already been accepted and is available in linux-next.


Thanks,
Praveen Talari

>
> -- Steve

^ permalink raw reply

* Re: [RFC PATCH 0/3] mm/compaction: honour compact_unevictable_allowed in mlock race and alloc_contig path
From: Wandun @ 2026-06-15  8:28 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-trace-kernel, linux-rt-devel,
	vbabka, bigeasy, rostedt, hughd
  Cc: akpm, surenb, mhocko, jackmanb, hannes, ziy, mhiramat,
	mathieu.desnoyers, david, ljs, liam, rppt, clrkwllms,
	Alexander.Krabler
In-Reply-To: <20260604023812.3700316-1-chenwandun1@gmail.com>



On 6/4/26 10:38, Wandun Chen wrote:
> From: Wandun Chen <chenwandun@lixiang.com>
> 
> vm.compact_unevictable_allowed=0 is meant to keep compaction from
> touching unevictable folios. In practice there are still two paths
> where it does not take effect. This series fixes them and adds a
> tracepoint to make such issues easier to diagnose in the future.

Gentle ping, any feedback on this series?

Sashiko found a potential compilation error in patch2, I'll address
that along with any review comments in v2.

Best regards,
Wandun

> 
> Wandun Chen (3):
>   mm/compaction: skip isolate mlocked folios when
>     compact_unevictable_allowed=0
>   mm/compaction: add per-folio isolation tracepoint
>   mm/compaction: respect compact_unevictable_allowed in alloc_contig
>     path
> 
>  include/linux/compaction.h        |  6 ++++++
>  include/trace/events/compaction.h | 26 ++++++++++++++++++++++++++
>  mm/compaction.c                   | 14 +++++++++++---
>  mm/internal.h                     |  1 +
>  mm/page_alloc.c                   |  2 ++
>  5 files changed, 46 insertions(+), 3 deletions(-)
> 


^ permalink raw reply

* [PATCH v4 7/7] tracing/probes: Add a new testcase for BTF typecasts
From: Masami Hiramatsu (Google) @ 2026-06-15  1:15 UTC (permalink / raw)
  To: Steven Rostedt, Mathieu Desnoyers
  Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178148603548.185520.3389196102475741865.stgit@devnote2>

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

With the introduction of container_of-style BTF typecasting and
per-CPU variable access support in trace probes, we need a way to
verify their functionality and prevent regressions.

Add a new ftrace kselftest and update the trace event sample module
to test and validate these features.

Specifically, update the trace-events-sample module to set up a
periodic timer whose callback accesses a per-CPU counter. Introduce
a new sample trace event, foo_timer_fn, to trace this callback
and log the current counter value.

Then, add a new test case, btf_probe_event.tc, which defines a
dynamic probe on the timer callback. The probe uses BTF typecasting
to recover the parent structure from the timer argument and
this_cpu_read() to fetch the per-CPU counter. The test verifies
the integrity of the implementation by ensuring the values
recorded by the dynamic probe match those from the static tracepoint.

Assisted-by: Antigravity:gemini-3.5-flash
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v4:
  - Fix uprobe $current test.
 Changes in v3:
  - Add syntax test case.
  - Update testcase to use this_cpu_read()
 Changes in v2:
  - Use timer_shutdown_sync() instead of timer_delete_sync() for teardown.
---
 samples/trace_events/trace-events-sample.c         |   40 +++++++++++++++-
 samples/trace_events/trace-events-sample.h         |   34 ++++++++++++-
 .../ftrace/test.d/dynevent/btf_probe_event.tc      |   51 ++++++++++++++++++++
 .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc |    9 ++++
 .../ftrace/test.d/kprobe/kprobe_syntax_errors.tc   |    9 ++++
 .../ftrace/test.d/kprobe/uprobe_syntax_errors.tc   |    5 ++
 6 files changed, 143 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc

diff --git a/samples/trace_events/trace-events-sample.c b/samples/trace_events/trace-events-sample.c
index 0b7a6efdb247..ca5d98c360cb 100644
--- a/samples/trace_events/trace-events-sample.c
+++ b/samples/trace_events/trace-events-sample.c
@@ -94,6 +94,20 @@ static int simple_thread_fn(void *arg)
 static DEFINE_MUTEX(thread_mutex);
 static int simple_thread_cnt;
 
+static struct foo_timer_data *foo_timer_data;
+
+static void sample_timer_cb(struct timer_list *t)
+{
+	struct foo_timer_data *data = container_of(t, struct foo_timer_data, timer);
+
+	get_cpu();
+	trace_foo_timer_fn(data);
+	(*this_cpu_ptr(data->counter))++;
+	put_cpu();
+
+	mod_timer(t, jiffies + HZ);
+}
+
 int foo_bar_reg(void)
 {
 	mutex_lock(&thread_mutex);
@@ -132,9 +146,27 @@ void foo_bar_unreg(void)
 
 static int __init trace_event_init(void)
 {
+	foo_timer_data = kzalloc_obj(*foo_timer_data, GFP_KERNEL);
+	if (!foo_timer_data)
+		return -ENOMEM;
+
+	foo_timer_data->name = "sample_timer_counter";
+	foo_timer_data->counter = alloc_percpu(int);
+	if (!foo_timer_data->counter) {
+		kfree(foo_timer_data);
+		return -ENOMEM;
+	}
+
+	timer_setup(&foo_timer_data->timer, sample_timer_cb, 0);
+	mod_timer(&foo_timer_data->timer, jiffies + HZ);
+
 	simple_tsk = kthread_run(simple_thread, NULL, "event-sample");
-	if (IS_ERR(simple_tsk))
-		return -1;
+	if (IS_ERR(simple_tsk)) {
+		timer_shutdown_sync(&foo_timer_data->timer);
+		free_percpu(foo_timer_data->counter);
+		kfree(foo_timer_data);
+		return PTR_ERR(simple_tsk);
+	}
 
 	return 0;
 }
@@ -147,6 +179,10 @@ static void __exit trace_event_exit(void)
 		kthread_stop(simple_tsk_fn);
 	simple_tsk_fn = NULL;
 	mutex_unlock(&thread_mutex);
+
+	timer_shutdown_sync(&foo_timer_data->timer);
+	free_percpu(foo_timer_data->counter);
+	kfree(foo_timer_data);
 }
 
 module_init(trace_event_init);
diff --git a/samples/trace_events/trace-events-sample.h b/samples/trace_events/trace-events-sample.h
index 1a05fc153353..816848a456a2 100644
--- a/samples/trace_events/trace-events-sample.h
+++ b/samples/trace_events/trace-events-sample.h
@@ -247,12 +247,14 @@
  */
 
 /*
- * It is OK to have helper functions in the file, but they need to be protected
- * from being defined more than once. Remember, this file gets included more
- * than once.
+ * It is OK to have helper functions and data structures in the file, but they
+ * need to be protected from being defined more than once. Remember, this file
+ * gets included more than once.
  */
 #ifndef __TRACE_EVENT_SAMPLE_HELPER_FUNCTIONS
 #define __TRACE_EVENT_SAMPLE_HELPER_FUNCTIONS
+#include <linux/timer.h>
+
 static inline int __length_of(const int *list)
 {
 	int i;
@@ -270,6 +272,13 @@ enum {
 	TRACE_SAMPLE_BAR = 4,
 	TRACE_SAMPLE_ZOO = 8,
 };
+
+struct foo_timer_data {
+	const char		*name;
+	struct timer_list	timer;
+	int __percpu		*counter;
+};
+
 #endif
 
 /*
@@ -595,6 +604,25 @@ TRACE_EVENT(foo_rel_loc,
 		  __get_rel_bitmask(bitmask),
 		  __get_rel_cpumask(cpumask))
 );
+
+TRACE_EVENT(foo_timer_fn,
+
+	TP_PROTO(struct foo_timer_data *data),
+
+	TP_ARGS(data),
+
+	TP_STRUCT__entry(
+		__string(	name,			data->name	)
+		__field(	int,			count		)
+	),
+
+	TP_fast_assign(
+		__assign_str(name);
+		__entry->count	= *this_cpu_ptr(data->counter);
+	),
+
+	TP_printk("name=%s count=%d", __get_str(name), __entry->count)
+);
 #endif
 
 /***** NOTICE! The #if protection ends here. *****/
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc b/tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc
new file mode 100644
index 000000000000..96791e120b7d
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc
@@ -0,0 +1,51 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: BTF event with typecast and percpu access
+# requires: dynamic_events "this_cpu_read(<fetcharg>)":README "[(structname[,field])]<argname>[->field[->field|.field...]]":README
+
+# Check if the sample module is loaded
+if ! lsmod | grep -q trace_events_sample; then
+  modprobe trace-events-sample || exit_unsupported
+fi
+
+echo 0 > events/enable
+echo > dynamic_events
+
+# The sample_timer_cb(struct timer_list *t) is called.
+# We want to check (STRUCT,FIELD)VAR typecast and this_cpu_read() access.
+# (foo_timer_data,timer)t converts t to struct foo_timer_data * using container_of.
+# data->counter is a per-cpu pointer to int.
+# this_cpu_read(data->counter) should give the value of the counter.
+
+echo 'f:mysample/myevent sample_timer_cb name=(foo_timer_data,timer)t->name:string count=this_cpu_read((foo_timer_data,timer)t->counter)' >> dynamic_events
+
+echo 1 > events/mysample/myevent/enable
+echo 1 > events/sample-trace/foo_timer_fn/enable
+
+sleep 2
+
+echo 0 > events/mysample/myevent/enable
+echo 0 > events/sample-trace/foo_timer_fn/enable
+
+# Compare the values.
+MATCH=0
+while read line; do
+  if echo $line | grep -q "foo_timer_fn:"; then
+    NAME=`echo $line | sed 's/.*name=\([^ ]*\) .*/\1/'`
+    COUNT=`echo $line | sed 's/.*count=\([^ ]*\).*/\1/'`
+    if grep -q "myevent:.*name=\"${NAME}\" count=$COUNT" trace; then
+       MATCH=$((MATCH+1))
+    fi
+  fi
+done < trace
+
+if [ $MATCH -eq 0 ]; then
+  echo "No matching events found"
+  exit_fail
+fi
+
+# Clean up
+echo 0 > events/mysample/myevent/enable
+echo 0 > events/sample-trace/foo_timer_fn/enable
+echo > dynamic_events
+clear_trace
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
index fee479295e2f..b781ec07c0d0 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
@@ -112,6 +112,15 @@ check_error 'f vfs_read%return $retval->^foo'	# NO_PTR_STRCT
 check_error 'f vfs_read file->^foo'		# NO_BTF_FIELD
 check_error 'f vfs_read file^-.foo'		# BAD_HYPHEN
 check_error 'f vfs_read ^file:string'		# BAD_TYPE4STR
+if grep -qF "[(structname" README ; then
+check_error 'f vfs_read arg1=(task_struct)file^'		# TYPECAST_REQ_FIELD
+check_error 'f vfs_read arg1=(f)((f)((f)((f)^((f)file->f)->f)->f)->f)->f'	# TOO_MANY_NESTED
+check_error 'f vfs_read arg1=(task_struct,^in_execve)file->comm'	# TYPECAST_NOT_ALIGNED
+check_error 'f vfs_read arg1=(task_struct,se^->group_node)file->comm'	# TYPECAST_BAD_ARROW
+check_error 'f vfs_read arg1=(task_struct,^->pid)file->comm'	# TYPECAST_BAD_ARROW
+check_error 'f vfs_read arg1=(task_struct,^.pid)file->comm'	# BTF_BAD_TID
+check_error 'f vfs_read arg1=(task_struct,^.)file->comm'	# BTF_BAD_TID
+fi
 fi
 
 else
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
index 8f1c58f0c239..78f015c8c010 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
@@ -115,6 +115,15 @@ check_error 'p vfs_read+20 ^$arg*'		# NOFENTRY_ARGS
 check_error 'p vfs_read ^hoge'			# NO_BTFARG
 check_error 'p kfree ^$arg10'			# NO_BTFARG (exceed the number of parameters)
 check_error 'r kfree ^$retval'			# NO_RETVAL
+if grep -qF "[(structname" README ; then
+check_error 'p vfs_read arg1=(task_struct)file^'		# TYPECAST_REQ_FIELD
+check_error 'p vfs_read arg1=(f)((f)((f)((f)^((f)file->f)->f)->f)->f)->f'	# TOO_MANY_NESTED
+check_error 'p vfs_read arg1=(task_struct,^in_execve)file->comm'	# TYPECAST_NOT_ALIGNED
+check_error 'p vfs_read arg1=(task_struct,se^->group_node)file->comm'	# TYPECAST_BAD_ARROW
+check_error 'p vfs_read arg1=(task_struct,^->pid)file->comm'	# TYPECAST_BAD_ARROW
+check_error 'p vfs_read arg1=(task_struct,^.pid)file->comm'	# BTF_BAD_TID
+check_error 'p vfs_read arg1=(task_struct,^.)file->comm'	# BTF_BAD_TID
+fi
 else
 check_error 'p vfs_read ^$arg*'			# NOSUP_BTFARG
 fi
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc
index c817158b99db..e12dc967ec76 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc
@@ -28,4 +28,9 @@ if grep -q ".*symstr.*" README; then
 check_error 'p /bin/sh:10 $stack0:^symstr'	# BAD_TYPE
 fi
 
+# $current is not supported by uprobe
+if grep -q "\$current.*" README; then
+check_error 'p /bin/sh:10 ^$current:u8'	# BAD_VAR
+fi
+
 exit 0


^ permalink raw reply related

* [PATCH v4 6/7] tracing/probes: Add this_cpu_read() and this_cpu_ptr() dereference method to fetcharg
From: Masami Hiramatsu (Google) @ 2026-06-15  1:14 UTC (permalink / raw)
  To: Steven Rostedt, Mathieu Desnoyers
  Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178148603548.185520.3389196102475741865.stgit@devnote2>

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

When tracing the kernel local variables, sometimes we need to get the
CPU local variables. To access it, current simple dereference is not
enough.

Thus, introduce a special this_cpu_read() dereference to access per-cpu
variable for the current CPU (accessing other CPU variable may race with
updates on other CPUs). Also this_cpu_ptr() is for accessing per-cpu
pointer.

Those are working as same as the kernel percpu macro.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v3:
  - Remove NULL check for percpu var because it is just an offset, could be 0.
  - Simplify process_fetch_insn_bottom() code.
  - If the last operation is this_cpu_read(), read only memory of the specific
    size (of type).
 Changes in v2:
  - Drop +CPU/+PCPU and introduce this_cpu_read() and this_cpu_ptr().
  - Support these method with BTF typecast.
  - Just check the base address is NOT NULL instead of is_kernel_percpu_address().
---
 Documentation/trace/eprobetrace.rst |    2 +
 Documentation/trace/fprobetrace.rst |    2 +
 Documentation/trace/kprobetrace.rst |    2 +
 kernel/trace/trace.c                |    1 
 kernel/trace/trace_probe.c          |  138 ++++++++++++++++++++++++-----------
 kernel/trace/trace_probe.h          |    3 +
 kernel/trace/trace_probe_tmpl.h     |   30 ++++++--
 7 files changed, 129 insertions(+), 49 deletions(-)

diff --git a/Documentation/trace/eprobetrace.rst b/Documentation/trace/eprobetrace.rst
index 680e0af43d5d..279396951b34 100644
--- a/Documentation/trace/eprobetrace.rst
+++ b/Documentation/trace/eprobetrace.rst
@@ -39,6 +39,8 @@ Synopsis of eprobe_events
   @SYM[+|-offs]	: Fetch memory at SYM +|- offs (SYM should be a data symbol)
   $comm		: Fetch current task comm.
   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
+  this_cpu_read(FETCHARG) : Read the value of the per-CPU variable FETCHARG on the current CPU.
+  this_cpu_ptr(FETCHARG) : Get the address of the per-CPU variable FETCHARG on the current CPU.
   \IMM		: Store an immediate value to the argument.
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
diff --git a/Documentation/trace/fprobetrace.rst b/Documentation/trace/fprobetrace.rst
index 3392cab016b3..3439bc9bd351 100644
--- a/Documentation/trace/fprobetrace.rst
+++ b/Documentation/trace/fprobetrace.rst
@@ -52,6 +52,8 @@ Synopsis of fprobe-events
   $comm         : Fetch current task comm.
   $current      : Fetch the address of the current task_struct.
   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5)
+  this_cpu_read(FETCHARG) : Read the value of the per-CPU variable FETCHARG on the current CPU.
+  this_cpu_ptr(FETCHARG) : Get the address of the per-CPU variable FETCHARG on the current CPU.
   \IMM          : Store an immediate value to the argument.
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index 81e4fe38791d..9ae330eb0a52 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -55,6 +55,8 @@ Synopsis of kprobe_events
   $comm		: Fetch current task comm.
   $current      : Fetch the address of the current task_struct.
   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
+  this_cpu_read(FETCHARG) : Read the value of the per-CPU variable FETCHARG on the current CPU.
+  this_cpu_ptr(FETCHARG) : Get the address of the per-CPU variable FETCHARG on the current CPU.
   \IMM		: Store an immediate value to the argument.
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7a5676524f1a..d4121acc2938 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4332,6 +4332,7 @@ static const char readme_msg[] =
 	"\t           $stack<index>, $stack, $retval, $comm, $current\n"
 #endif
 	"\t           +|-[u]<offset>(<fetcharg>), \\imm-value, \\\"imm-string\"\n"
+	"\t           this_cpu_read(<fetcharg>), this_cpu_ptr(<fetcharg>)\n"
 	"\t     kernel return probes support: $retval, $arg<N>, $comm\n"
 	"\t     type: s8/16/32/64, u8/16/32/64, x8/16/32/64, char, string, symbol,\n"
 	"\t           b<bit-width>@<bit-offset>/<container-size>, ustring,\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index f5155340a325..a7aaba3d5375 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -349,6 +349,77 @@ static int parse_trace_event(char *arg, struct fetch_insn *code,
 	return -EINVAL;
 }
 
+/* this_cpu_* parser */
+#define THIS_CPU_PTR_PREFIX "this_cpu_ptr("
+#define THIS_CPU_READ_PREFIX "this_cpu_read("
+#define THIS_CPU_PTR_LEN (sizeof(THIS_CPU_PTR_PREFIX) - 1)
+#define THIS_CPU_READ_LEN (sizeof(THIS_CPU_READ_PREFIX) - 1)
+
+static int
+parse_probe_arg(char *arg, const struct fetch_type *type,
+		struct fetch_insn **pcode, struct fetch_insn *end,
+		struct traceprobe_parse_context *ctx);
+
+/* handle dereference nested call */
+static inline int handle_dereference(char *arg, struct fetch_insn **pcode,
+	struct fetch_insn *end, struct traceprobe_parse_context *ctx,
+	int deref, long offset)
+{
+	const struct fetch_type *type = find_fetch_type(NULL, ctx->flags);
+	struct fetch_insn *code = *pcode;
+	int cur_offs = ctx->offset;
+	char *tmp;
+	int ret;
+
+	tmp = strrchr(arg, ')');
+	if (!tmp) {
+		trace_probe_log_err(ctx->offset + strlen(arg),
+					DEREF_OPEN_BRACE);
+		return -EINVAL;
+	}
+
+	*tmp = '\0';
+	ret = parse_probe_arg(arg, type, &code, end, ctx);
+	if (ret)
+		return ret;
+	ctx->offset = cur_offs;
+	if (code->op == FETCH_OP_COMM || code->op == FETCH_OP_DATA) {
+		trace_probe_log_err(ctx->offset, COMM_CANT_DEREF);
+		return -EINVAL;
+	}
+	code++;
+	if (code == end) {
+		trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
+		return -EINVAL;
+	}
+	*pcode = code;
+
+	code->op = deref;
+	code->offset = offset;
+	/* Reset the last type if used */
+	ctx->last_type = NULL;
+	return 0;
+}
+
+static int parse_this_cpu(char *arg, struct fetch_insn **pcode,
+			  struct fetch_insn *end,
+			  struct traceprobe_parse_context *ctx)
+{
+	int deref;
+
+	if (str_has_prefix(arg, THIS_CPU_PTR_PREFIX)) {
+		arg += THIS_CPU_PTR_LEN;
+		ctx->offset += THIS_CPU_PTR_LEN;
+		deref = FETCH_OP_CPU_PTR;
+	} else if (str_has_prefix(arg, THIS_CPU_READ_PREFIX)) {
+		arg += THIS_CPU_READ_LEN;
+		ctx->offset += THIS_CPU_READ_LEN;
+		deref = FETCH_OP_DEREF_CPU;
+	} else
+		return -EINVAL;
+	return handle_dereference(arg, pcode, end, ctx, deref, 0);
+}
+
 #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
 
 static u32 btf_type_int(const struct btf_type *t)
@@ -929,11 +1000,6 @@ static char *find_matched_close_paren(char *s)
 	return NULL;
 }
 
-static int
-parse_probe_arg(char *arg, const struct fetch_type *type,
-		struct fetch_insn **pcode, struct fetch_insn *end,
-		struct traceprobe_parse_context *ctx);
-
 static int handle_typecast(char *arg, struct fetch_insn **pcode,
 			   struct fetch_insn *end,
 			   struct traceprobe_parse_context *ctx)
@@ -959,7 +1025,8 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
 	*tmp++ = '\0';
 
 	/* Handle the nested structure like (STRUCT)(VAR->FIELD)->... */
-	if (*tmp == '(') {
+	if (*tmp == '(' || str_has_prefix(tmp, THIS_CPU_PTR_PREFIX) ||
+	    str_has_prefix(tmp, THIS_CPU_READ_PREFIX)) {
 		char *close = find_matched_close_paren(tmp);
 
 		ctx->offset += tmp - arg;
@@ -979,12 +1046,18 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
 			trace_probe_log_err(ctx->offset, TOO_MANY_NESTED);
 			return -E2BIG;
 		}
-		*close = '\0';
 
-		ctx->offset += 1;	/* for the '(' */
-		/* We need to parse the nested one */
-		ret = parse_probe_arg(tmp + 1, find_fetch_type(NULL, ctx->flags),
-				pcode, end, ctx);
+		if (*tmp == '(') {
+			/* Extract the inner argument */
+			*close = '\0';
+			ctx->offset += 1;/* for the '(' */
+			/* Parse the nested one */
+			ret = parse_probe_arg(tmp + 1, find_fetch_type(NULL, ctx->flags),
+					pcode, end, ctx);
+		} else {
+			/* this_cpu_* will be parsed in parse_this_cpu() */
+			ret = parse_this_cpu(tmp, pcode, end, ctx);
+		}
 		if (ret < 0)
 			return ret;
 		ctx->nested_level--;
@@ -1454,36 +1527,9 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 		}
 		ctx->offset += (tmp + 1 - arg) + (arg[0] != '-' ? 1 : 0);
 		arg = tmp + 1;
-		tmp = strrchr(arg, ')');
-		if (!tmp) {
-			trace_probe_log_err(ctx->offset + strlen(arg),
-					    DEREF_OPEN_BRACE);
-			return -EINVAL;
-		} else {
-			const struct fetch_type *t2 = find_fetch_type(NULL, ctx->flags);
-			int cur_offs = ctx->offset;
-
-			*tmp = '\0';
-			ret = parse_probe_arg(arg, t2, &code, end, ctx);
-			if (ret)
-				break;
-			ctx->offset = cur_offs;
-			if (code->op == FETCH_OP_COMM ||
-			    code->op == FETCH_OP_DATA) {
-				trace_probe_log_err(ctx->offset, COMM_CANT_DEREF);
-				return -EINVAL;
-			}
-			if (++code == end) {
-				trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
-				return -EINVAL;
-			}
-			*pcode = code;
-
-			code->op = deref;
-			code->offset = offset;
-			/* Reset the last type if used */
-			ctx->last_type = NULL;
-		}
+		ret = handle_dereference(arg, pcode, end, ctx, deref, offset);
+		if (ret < 0)
+			return ret;
 		break;
 	case '\\':	/* Immediate value */
 		if (arg[1] == '"') {	/* Immediate string */
@@ -1504,15 +1550,18 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 		ret = handle_typecast(arg, pcode, end, ctx);
 		break;
 	default:
-		if (isalpha(arg[0]) || arg[0] == '_') {	/* BTF variable */
+		if (str_has_prefix(arg, THIS_CPU_PTR_PREFIX) ||
+		    str_has_prefix(arg, THIS_CPU_READ_PREFIX)) {
+			ret = parse_this_cpu(arg, pcode, end, ctx);
+		} else if (isalpha(arg[0]) || arg[0] == '_') {	/* BTF variable */
 			if (!tparg_is_function_entry(ctx->flags) &&
 			    !tparg_is_function_return(ctx->flags)) {
 				trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
 				return -EINVAL;
 			}
 			ret = parse_btf_arg(arg, pcode, end, ctx);
-			break;
 		}
+		break;
 	}
 	if (!ret && code->op == FETCH_OP_NOP) {
 		/* Parsed, but do not find fetch method */
@@ -1687,6 +1736,9 @@ static int finalize_fetch_insn(struct fetch_insn *code,
 	} else if (code->op == FETCH_OP_UDEREF) {
 		code->op = FETCH_OP_ST_UMEM;
 		code->size = parg->type->size;
+	} else if (code->op == FETCH_OP_DEREF_CPU) {
+		code->op = FETCH_OP_ST_CPUMEM;
+		code->size = parg->type->size;
 	} else {
 		code++;
 		if (code->op != FETCH_OP_NOP) {
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 62645e847bd1..523612023608 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -100,10 +100,13 @@ enum fetch_op {
 	// Stage 2 (dereference) op
 	FETCH_OP_DEREF,		/* Dereference: .offset */
 	FETCH_OP_UDEREF,	/* User-space Dereference: .offset */
+	FETCH_OP_DEREF_CPU,	/* Per-CPU Dereference for this CPU */
+	FETCH_OP_CPU_PTR,	/* Per-CPU pointer for this CPU */
 	// Stage 3 (store) ops
 	FETCH_OP_ST_RAW,	/* Raw: .size */
 	FETCH_OP_ST_MEM,	/* Mem: .offset, .size */
 	FETCH_OP_ST_UMEM,	/* Mem: .offset, .size */
+	FETCH_OP_ST_CPUMEM,	/* Per-CPU Mem: .size */
 	FETCH_OP_ST_STRING,	/* String: .offset, .size */
 	FETCH_OP_ST_USTRING,	/* User String: .offset, .size */
 	FETCH_OP_ST_SYMSTR,	/* Kernel Symbol String: .offset, .size */
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index f630930288d2..83111c167b74 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -129,25 +129,39 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 	struct fetch_insn *s3 = NULL;
 	int total = 0, ret = 0, i = 0;
 	u32 loc = 0;
-	unsigned long lval = val;
+	unsigned long lval, llval = val;
 
 stage2:
 	/* 2nd stage: dereference memory if needed */
 	do {
-		if (code->op == FETCH_OP_DEREF) {
-			lval = val;
+		lval = val;
+		switch (code->op) {
+		case FETCH_OP_DEREF:
 			ret = probe_mem_read(&val, (void *)val + code->offset,
 					     sizeof(val));
-		} else if (code->op == FETCH_OP_UDEREF) {
-			lval = val;
+			break;
+		case FETCH_OP_UDEREF:
 			ret = probe_mem_read_user(&val,
 				 (void *)val + code->offset, sizeof(val));
-		} else
 			break;
+		case FETCH_OP_DEREF_CPU:
+			val = (unsigned long)this_cpu_ptr((void __percpu *)val);
+			ret = probe_mem_read(&val, (void *)val, sizeof(val));
+			break;
+		case FETCH_OP_CPU_PTR:
+			val = (unsigned long)this_cpu_ptr((void __percpu *)val);
+			ret = 0;
+			break;
+		default:
+			lval = llval;
+			goto out;
+		}
 		if (ret)
 			return ret;
+		llval = lval;
 		code++;
 	} while (1);
+out:
 
 	s3 = code;
 stage3:
@@ -181,6 +195,10 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 	case FETCH_OP_ST_UMEM:
 		probe_mem_read_user(dest, (void *)val + code->offset, code->size);
 		break;
+	case FETCH_OP_ST_CPUMEM:
+		val = (unsigned long)this_cpu_ptr((void __percpu *)val);
+		probe_mem_read(dest, (void *)val, code->size);
+		break;
 	case FETCH_OP_ST_STRING:
 		loc = *(u32 *)dest;
 		ret = fetch_store_string(val + code->offset, dest, base);


^ permalink raw reply related

* [PATCH v4 5/7] tracing/probes: Add $current variable support
From: Masami Hiramatsu (Google) @ 2026-06-15  1:14 UTC (permalink / raw)
  To: Steven Rostedt, Mathieu Desnoyers
  Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178148603548.185520.3389196102475741865.stgit@devnote2>

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Since we can use the BTF to cast value to a structure pointer type,
it is useful to introduce "$current" special variable support to
fetcharg.

User can define a fetcharg to access current task_struct properties
using BTF info. e.g.

  $current->cpus_ptr

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v4:
  - Add $current in README when CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y case.
  - Fix to prohibit using $current in eprobes and address based kprobes.
 Changes in v3:
  - Remove $current support from eprobes (because eprobes is only for event)
  - Prohibit uprobes to use $current.
 Changes in v2:
   - Support to parse $current in parse_btf_arg().
   - If no typecast on $current, it automatically casted to task_struct.
   - Check error case if $current follows something except for "-".
---
 Documentation/trace/fprobetrace.rst |    1 +
 Documentation/trace/kprobetrace.rst |    1 +
 kernel/trace/trace.c                |    4 ++--
 kernel/trace/trace_probe.c          |   34 +++++++++++++++++++++++++++++++++-
 kernel/trace/trace_probe.h          |    1 +
 kernel/trace/trace_probe_tmpl.h     |    3 +++
 6 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/Documentation/trace/fprobetrace.rst b/Documentation/trace/fprobetrace.rst
index 290a9e6f7491..3392cab016b3 100644
--- a/Documentation/trace/fprobetrace.rst
+++ b/Documentation/trace/fprobetrace.rst
@@ -50,6 +50,7 @@ Synopsis of fprobe-events
   $argN         : Fetch the Nth function argument. (N >= 1) (\*2)
   $retval       : Fetch return value.(\*3)
   $comm         : Fetch current task comm.
+  $current      : Fetch the address of the current task_struct.
   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5)
   \IMM          : Store an immediate value to the argument.
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index a62707e6a9f2..81e4fe38791d 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -53,6 +53,7 @@ Synopsis of kprobe_events
   $argN		: Fetch the Nth function argument. (N >= 1) (\*1)
   $retval	: Fetch return value.(\*2)
   $comm		: Fetch current task comm.
+  $current      : Fetch the address of the current task_struct.
   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
   \IMM		: Store an immediate value to the argument.
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0e36af853199..7a5676524f1a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4323,13 +4323,13 @@ static const char readme_msg[] =
 	"\t     args: <name>=fetcharg[:type]\n"
 	"\t fetcharg: (%<register>|$<efield>), @<address>, @<symbol>[+|-<offset>],\n"
 #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
-	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
+	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>, $current\n"
 #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
 	"\t           [(structname[,field])]<argname>[->field[->field|.field...]],\n"
 	"\t           [(structname[,field])](fetcharg)->field[->field|.field...],\n"
 #endif
 #else
-	"\t           $stack<index>, $stack, $retval, $comm,\n"
+	"\t           $stack<index>, $stack, $retval, $comm, $current\n"
 #endif
 	"\t           +|-[u]<offset>(<fetcharg>), \\imm-value, \\\"imm-string\"\n"
 	"\t     kernel return probes support: $retval, $arg<N>, $comm\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 3ead93de2d93..f5155340a325 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -730,6 +730,20 @@ static int parse_btf_arg(char *varname,
 		goto found;
 	}
 
+	if (strcmp(varname, "$current") == 0) {
+		code->op = FETCH_OP_CURRENT;
+		/* If no typecast is specified for $current, use task_struct by default */
+		if (!ctx->struct_btf) {
+			tid = bpf_find_btf_id("task_struct", BTF_KIND_STRUCT, &ctx->struct_btf);
+			if (tid < 0) {
+				trace_probe_log_err(ctx->offset, NO_BTF_ENTRY);
+				return -ENOENT;
+			}
+			ctx->last_struct = btf_type_skip_modifiers(ctx->struct_btf, tid, &tid);
+		}
+		goto found;
+	}
+
 	if (ctx->flags & TPARG_FL_RETURN && !strcmp(varname, "$retval")) {
 		code->op = FETCH_OP_RETVAL;
 		/* Check whether the function return type is not void */
@@ -756,8 +770,8 @@ static int parse_btf_arg(char *varname,
 			return -ENOENT;
 		}
 	}
-	params = ctx->params;
 
+	params = ctx->params;
 	for (i = 0; i < ctx->nr_params; i++) {
 		const char *name = btf_name_by_offset(ctx->btf, params[i].name_off);
 
@@ -1247,6 +1261,24 @@ static int parse_probe_vars(char *orig_arg, const struct fetch_type *t,
 		return 0;
 	}
 
+	/* $current returns the address of the current task_struct. */
+	if (str_has_prefix(arg, "current")) {
+		/* $current is only supported by kernel probe and need function name. */
+		if (!(ctx->flags & TPARG_FL_KERNEL) || !ctx->funcname) {
+			err = TP_ERR_BAD_VAR;
+			goto inval;
+		}
+		arg += strlen("current");
+		if (*arg == '-' && IS_ENABLED(CONFIG_PROBE_EVENTS_BTF_ARGS))
+			return parse_btf_arg(orig_arg, pcode, end, ctx);
+
+		if (*arg != '\0')
+			goto inval;
+
+		code->op = FETCH_OP_CURRENT;
+		return 0;
+	}
+
 #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
 	len = str_has_prefix(arg, "arg");
 	if (len) {
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 44f113faae61..62645e847bd1 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -96,6 +96,7 @@ enum fetch_op {
 	FETCH_OP_FOFFS,		/* File offset: .immediate */
 	FETCH_OP_DATA,		/* Allocated data: .data */
 	FETCH_OP_EDATA,		/* Entry data: .offset */
+	FETCH_OP_CURRENT,	/* Current task_struct address */
 	// Stage 2 (dereference) op
 	FETCH_OP_DEREF,		/* Dereference: .offset */
 	FETCH_OP_UDEREF,	/* User-space Dereference: .offset */
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index f39b37fcdb3b..f630930288d2 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -112,6 +112,9 @@ process_common_fetch_insn(struct fetch_insn *code, unsigned long *val)
 	case FETCH_OP_DATA:
 		*val = (unsigned long)code->data;
 		break;
+	case FETCH_OP_CURRENT:
+		*val = (unsigned long)current;
+		break;
 	default:
 		return -EILSEQ;
 	}


^ permalink raw reply related

* [PATCH v4 4/7] tracing/probes: Support field specifier option for typecast
From: Masami Hiramatsu (Google) @ 2026-06-15  1:14 UTC (permalink / raw)
  To: Steven Rostedt, Mathieu Desnoyers
  Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178148603548.185520.3389196102475741865.stgit@devnote2>

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Add a field specifier option for the typecast. This works like
container_of() macro.

    (STRUCT[,FIELD[.FIELD2...]])VAR

This is equivalent to :

    container_of(VAR, struct STRUCT, FIELD[.FIELD2...])

For example:

 echo "f tick_nohz_handler next_tick=(tick_sched,sched_timer)timer->next_tick" >> dynamic_events

This will trace tick_nohz_handler() with its tick_sched::next_tick which
is converted from @timer by contianer_of(tick, struct tick_sched, sched_timer).
So, if you enabkle both fprobes:tick_nohz_handler__entry and
timer:hrtimer_expire_entry events, we will see something like:


          <idle>-0       [002] d.h1.  3778.087272: hrtimer_expire_entry: hrtimer=00000000d63db328 f
unction=tick_nohz_handler now=3777450051040
          <idle>-0       [002] d.h1.  3778.087281: tick_nohz_handler__entry: (tick_nohz_handler+0x4
/0x140) next_tick=3777450000000


Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v3:
  - Fix error caret position.
 Changes in v2:
  - Use byteoffset for typecast field offset instead of bitoffset. This fixes negative modulo calculation.
  - Check whether a field is specified after typecast.
  - Reject if typecast field option  has arrow operator.
---
 Documentation/trace/eprobetrace.rst |    5 +
 Documentation/trace/fprobetrace.rst |    8 +-
 Documentation/trace/kprobetrace.rst |    8 +-
 kernel/trace/trace.c                |    4 -
 kernel/trace/trace_probe.c          |  179 ++++++++++++++++++++++++-----------
 kernel/trace/trace_probe.h          |    5 +
 6 files changed, 142 insertions(+), 67 deletions(-)

diff --git a/Documentation/trace/eprobetrace.rst b/Documentation/trace/eprobetrace.rst
index cd0b4aa7f896..680e0af43d5d 100644
--- a/Documentation/trace/eprobetrace.rst
+++ b/Documentation/trace/eprobetrace.rst
@@ -49,7 +49,10 @@ Synopsis of eprobe_events
   (STRUCT)FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
                   a pointer to STRUCT and then derference the pointer defined by
                   ->MEMBER. Note that when this is used, the FIELD name does not
-                  need to be prefixed with a '$'.
+                  need to be prefixed with a '$'. ASGN can be specified optionally.
+		  If ASGN is specified, FIELD will be cast to the same offset
+		  position as the ASGN member, rather than to the beginning of
+		  the STRUCT.
   (STRUCT)(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
 		  also be used with another FETCHARG instead of FIELD.
 
diff --git a/Documentation/trace/fprobetrace.rst b/Documentation/trace/fprobetrace.rst
index 6b8bb27bb62d..290a9e6f7491 100644
--- a/Documentation/trace/fprobetrace.rst
+++ b/Documentation/trace/fprobetrace.rst
@@ -57,10 +57,12 @@ Synopsis of fprobe-events
                   (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
                   (x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr"
                   and bitfield are supported.
-  (STRUCT)FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
+  (STRUCT[,ASGN])FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
                   a pointer to STRUCT and then derference the pointer defined by
-                  ->MEMBER.
-  (STRUCT)(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
+                  ->MEMBER. ASGN can be specified optionally. If ASGN is specified,
+		  FIELD will be cast to the same offset position as the ASGN member,
+		  rather than to the beginning of the STRUCT.
+  (STRUCT[,ASGN])(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
                  also be used with another FETCHARG instead of FIELD.
 
   (\*1) This is available only when BTF is enabled.
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index c4382765d5b2..a62707e6a9f2 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -61,11 +61,13 @@ Synopsis of kprobe_events
 		  (x8/x16/x32/x64), VFS layer common type(%pd/%pD), "char",
                   "string", "ustring", "symbol", "symstr" and bitfield are
                   supported.
-  (STRUCT)FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
+  (STRUCT[,ASGN])FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
                   a pointer to STRUCT and then derference the pointer defined by
                   ->MEMBER. Note that this is available only when the probe is
-		   on function entry.
-  (STRUCT)(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
+		   on function entry. ASGN can be specified optionally. If ASGN
+		   is specified, FIELD will be cast to the same offset position
+		   as the ASGN member, rather than to the beginning of the STRUCT.
+  (STRUCT[,ASGN])(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
                  also be used with another FETCHARG instead of FIELD.
 
   (\*1) only for the probe on function entry (offs == 0). Note, this argument access
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4f70318918c2..0e36af853199 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4325,8 +4325,8 @@ static const char readme_msg[] =
 #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
 	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
 #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
-	"\t           [(structname)]<argname>[->field[->field|.field...]],\n"
-	"\t           [(structname)](fetcharg)->field[->field|.field...],\n"
+	"\t           [(structname[,field])]<argname>[->field[->field|.field...]],\n"
+	"\t           [(structname[,field])](fetcharg)->field[->field|.field...],\n"
 #endif
 #else
 	"\t           $stack<index>, $stack, $retval, $comm,\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index adcb5a19e72d..3ead93de2d93 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -574,6 +574,65 @@ static int split_next_field(char *varname, char **next_field,
 	return ret;
 }
 
+/* Inner loop for solving dot operator ('.'). Return bit-offset of the given field */
+static int get_bitoffset_of_field(char **pfieldname, const struct btf_type **ptype,
+				  struct traceprobe_parse_context *ctx)
+{
+	const struct btf_type *type = *ptype;
+	const struct btf_member *field;
+	struct btf *btf = ctx_btf(ctx);
+	char *fieldname = *pfieldname;
+	int bitoffs = 0;
+	u32 anon_offs;
+	char *next;
+	int is_ptr;
+	s32 tid;
+
+	do {
+		next = NULL;
+		is_ptr = split_next_field(fieldname, &next, ctx);
+		if (is_ptr < 0)
+			return is_ptr;
+
+		anon_offs = 0;
+		field = btf_find_struct_member(btf, type, fieldname,
+						&anon_offs);
+		if (IS_ERR(field)) {
+			trace_probe_log_err(ctx->offset, BAD_BTF_TID);
+			return PTR_ERR(field);
+		}
+		if (!field) {
+			trace_probe_log_err(ctx->offset, NO_BTF_FIELD);
+			return -ENOENT;
+		}
+		/* Add anonymous structure/union offset */
+		bitoffs += anon_offs;
+
+		/* Accumulate the bit-offsets of the dot-connected fields */
+		if (btf_type_kflag(type)) {
+			bitoffs += BTF_MEMBER_BIT_OFFSET(field->offset);
+			ctx->last_bitsize = BTF_MEMBER_BITFIELD_SIZE(field->offset);
+		} else {
+			bitoffs += field->offset;
+			ctx->last_bitsize = 0;
+		}
+
+		type = btf_type_skip_modifiers(btf, field->type, &tid);
+		if (!type) {
+			trace_probe_log_err(ctx->offset, BAD_BTF_TID);
+			return -EINVAL;
+		}
+
+		if (next)
+			ctx->offset += next - fieldname;
+		fieldname = next;
+	} while (!is_ptr && fieldname);
+
+	*pfieldname = fieldname;
+	*ptype = type;
+
+	return bitoffs;
+}
 /*
  * Parse the field of data structure. The @type must be a pointer type
  * pointing the target data structure type.
@@ -583,16 +642,14 @@ static int parse_btf_field(char *fieldname, const struct btf_type *type,
 			   struct traceprobe_parse_context *ctx)
 {
 	struct fetch_insn *code = *pcode;
-	const struct btf_member *field;
-	u32 bitoffs, anon_offs;
-	bool is_struct = ctx->struct_btf != NULL;
 	struct btf *btf = ctx_btf(ctx);
-	char *next;
-	int is_ptr;
+	bool is_first_field = true;
+	int bitoffs;
 	s32 tid;
 
 	do {
-		if (!is_struct) {
+		/* For the first field of typecast, @type will be the target structure type. */
+		if (!(is_first_field && ctx->struct_btf)) {
 			/* Outer loop for solving arrow operator ('->') */
 			if (BTF_INFO_KIND(type->info) != BTF_KIND_PTR) {
 				trace_probe_log_err(ctx->offset, NO_PTR_STRCT);
@@ -606,60 +663,25 @@ static int parse_btf_field(char *fieldname, const struct btf_type *type,
 				return -EINVAL;
 			}
 		}
-		/* Only the first type can skip being a pointer */
-		is_struct = false;
-
-		bitoffs = 0;
-		do {
-			/* Inner loop for solving dot operator ('.') */
-			next = NULL;
-			is_ptr = split_next_field(fieldname, &next, ctx);
-			if (is_ptr < 0)
-				return is_ptr;
-
-			anon_offs = 0;
-			field = btf_find_struct_member(btf, type, fieldname,
-						       &anon_offs);
-			if (IS_ERR(field)) {
-				trace_probe_log_err(ctx->offset, BAD_BTF_TID);
-				return PTR_ERR(field);
-			}
-			if (!field) {
-				trace_probe_log_err(ctx->offset, NO_BTF_FIELD);
-				return -ENOENT;
-			}
-			/* Add anonymous structure/union offset */
-			bitoffs += anon_offs;
-
-			/* Accumulate the bit-offsets of the dot-connected fields */
-			if (btf_type_kflag(type)) {
-				bitoffs += BTF_MEMBER_BIT_OFFSET(field->offset);
-				ctx->last_bitsize = BTF_MEMBER_BITFIELD_SIZE(field->offset);
-			} else {
-				bitoffs += field->offset;
-				ctx->last_bitsize = 0;
-			}
-
-			type = btf_type_skip_modifiers(btf, field->type, &tid);
-			if (!type) {
-				trace_probe_log_err(ctx->offset, BAD_BTF_TID);
-				return -EINVAL;
-			}
-
-			ctx->offset += next - fieldname;
-			fieldname = next;
-		} while (!is_ptr && fieldname);
 
+		bitoffs = get_bitoffset_of_field(&fieldname, &type, ctx);
+		if (bitoffs < 0)
+			return bitoffs;
 		if (++code == end) {
 			trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
 			return -EINVAL;
 		}
 		code->op = FETCH_OP_DEREF;	/* TODO: user deref support */
 		code->offset = bitoffs / 8;
+		if (is_first_field && ctx->struct_btf) {
+			/* The first field can be typecasted with field option. */
+			code->offset -= ctx->prefix_byteoffs;
+		}
 		*pcode = code;
 
 		ctx->last_bitoffs = bitoffs % 8;
 		ctx->last_type = type;
+		is_first_field = false;
 	} while (fieldname);
 
 	return 0;
@@ -690,6 +712,11 @@ static int parse_btf_arg(char *varname,
 				    NOSUP_DAT_ARG);
 		return -EOPNOTSUPP;
 	}
+	if (!field && ctx->struct_btf) {
+		/* Typecast without field option is not supported */
+		trace_probe_log_err(ctx->offset + strlen(varname), TYPECAST_REQ_FIELD);
+		return -EOPNOTSUPP;
+	}
 
 	if (ctx->flags & TPARG_FL_TEVENT) {
 		ret = parse_trace_event(varname, code, ctx);
@@ -700,8 +727,7 @@ static int parse_btf_arg(char *varname,
 		/* TEVENT is only here via a typecast */
 		if (WARN_ON_ONCE(ctx->struct_btf == NULL))
 			return -EINVAL;
-		type = ctx->last_struct;
-		goto found_type;
+		goto found;
 	}
 
 	if (ctx->flags & TPARG_FL_RETURN && !strcmp(varname, "$retval")) {
@@ -763,7 +789,6 @@ static int parse_btf_arg(char *varname,
 		type = ctx->last_struct;
 	else
 		type = btf_type_skip_modifiers(ctx->btf, tid, &tid);
-found_type:
 	if (!type) {
 		trace_probe_log_err(ctx->offset, BAD_BTF_TID);
 		return -EINVAL;
@@ -832,6 +857,46 @@ static int query_btf_struct(const char *sname, struct traceprobe_parse_context *
 	return 0;
 }
 
+static int parse_btf_casttype(char *casttype, struct traceprobe_parse_context *ctx)
+{
+	char *field;
+	int ret;
+
+	/* Field option - evaluated later. */
+	field = strchr(casttype, ',');
+	if (field)
+		*field++ = '\0';
+
+	ret = query_btf_struct(casttype, ctx);
+	if (ret < 0) {
+		trace_probe_log_err(ctx->offset, NO_PTR_STRCT);
+		return -EINVAL;
+	}
+
+	if (field) {
+		struct btf_type *type = (struct btf_type *)ctx->last_struct;
+
+		ctx->offset += field - casttype;
+		ret = get_bitoffset_of_field(&field, &ctx->last_struct, ctx);
+		if (ret < 0)
+			return ret;
+		if (ret % 8) {
+			trace_probe_log_err(ctx->offset, TYPECAST_NOT_ALIGNED);
+			return -EINVAL;
+		}
+		if (field != NULL) {
+			/* this means @field skips an arrow operator ("->"). */
+			trace_probe_log_err(ctx->offset - 2, TYPECAST_BAD_ARROW);
+			return -EINVAL;
+		}
+		ctx->prefix_byteoffs = ret / 8;
+		/* Restore the original struct type (overwritten by get_bitoffset_of_field) */
+		ctx->last_struct = type;
+	}
+
+	return ret;
+}
+
 /* Find the matching closing parenthesis for a given opening parenthesis. */
 static char *find_matched_close_paren(char *s)
 {
@@ -915,11 +980,10 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
 		nested = true;
 	}
 
-	ret = query_btf_struct(arg + 1, ctx);
-	if (ret < 0) {
-		trace_probe_log_err(orig_offset + 1, NO_PTR_STRCT);
-		return -EINVAL;
-	}
+	ctx->offset = orig_offset + 1;	/* for the '(' */
+	ret = parse_btf_casttype(arg + 1, ctx);
+	if (ret < 0)
+		return ret;
 
 	ctx->offset = orig_offset + tmp - arg;
 	/* If it is nested, tmp points to the field name. */
@@ -927,6 +991,7 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
 		ret = parse_btf_field(tmp, ctx->last_struct, pcode, end, ctx);
 	else
 		ret = parse_btf_arg(tmp, pcode, end, ctx);
+	ctx->prefix_byteoffs = 0;
 	return ret;
 }
 
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 982d32a5df8b..44f113faae61 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -436,6 +436,7 @@ struct traceprobe_parse_context {
 	unsigned int flags;
 	int offset;
 	int nested_level;
+	int prefix_byteoffs;	/* The byte offset of the prefix field of typecast */
 };
 
 #define TRACEPROBE_MAX_NESTED_LEVEL 3
@@ -576,7 +577,9 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
 	C(EVENT_TOO_BIG,	"Event too big (too many fields?)"),  \
 	C(TYPECAST_NOT_EVENT,	"Typecasts are only for eprobe fields"), \
 	C(TYPECAST_REQ_FIELD,	"Typecast requires a field access"),	\
-	C(TOO_MANY_NESTED,	"Too many nested typecasts/dereferences"),
+	C(TOO_MANY_NESTED,	"Too many nested typecasts/dereferences"), \
+	C(TYPECAST_NOT_ALIGNED,	"Typecast field option is not byte-aligned"), \
+	C(TYPECAST_BAD_ARROW,	"Typecast field option does not support -> operator"),
 
 #undef C
 #define C(a, b)		TP_ERR_##a


^ permalink raw reply related

* [PATCH v4 3/7] tracing/probes: Support nested typecast
From: Masami Hiramatsu (Google) @ 2026-06-15  1:14 UTC (permalink / raw)
  To: Steven Rostedt, Mathieu Desnoyers
  Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178148603548.185520.3389196102475741865.stgit@devnote2>

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

When we hit an open parenthesis right after typecast closing
parenthesis, it means we have nested typecast. This allows us to
typecast a generic data member in a structure to a pointer to
another structure.

For example, to cast a DATA_MEMBER of VAR structure to STRUCT pointer
and get MEMBER value.

  (STRUCT)(VAR->DATA_MEMBER)->MEMBER

Also, we can nest typecast.

  (STRUCT1)((STRUCT2)$ARG->FIELD2)->FIELD1

Currently the max nest level is limited to 3.

This also allows user to use typecasting for registers or stacks on
kprobe events. e.g.

  (STRUCT)(%ax)->MEMBER

  (STRUCT)($stack0)->MEMBER


Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v4:
  - Use orig_offset for reporting NO_PTR_STRCT error.
 Changes in v2:
  - Fix to skip "->" after closing parenthetsis.
---
 Documentation/trace/eprobetrace.rst |    2 +
 Documentation/trace/fprobetrace.rst |    2 +
 Documentation/trace/kprobetrace.rst |    2 +
 kernel/trace/trace.c                |    1 
 kernel/trace/trace_probe.c          |   78 +++++++++++++++++++++++++++++++----
 kernel/trace/trace_probe.h          |    7 +++
 6 files changed, 83 insertions(+), 9 deletions(-)

diff --git a/Documentation/trace/eprobetrace.rst b/Documentation/trace/eprobetrace.rst
index fe3602540569..cd0b4aa7f896 100644
--- a/Documentation/trace/eprobetrace.rst
+++ b/Documentation/trace/eprobetrace.rst
@@ -50,6 +50,8 @@ Synopsis of eprobe_events
                   a pointer to STRUCT and then derference the pointer defined by
                   ->MEMBER. Note that when this is used, the FIELD name does not
                   need to be prefixed with a '$'.
+  (STRUCT)(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
+		  also be used with another FETCHARG instead of FIELD.
 
 Types
 -----
diff --git a/Documentation/trace/fprobetrace.rst b/Documentation/trace/fprobetrace.rst
index 7435ded2d66d..6b8bb27bb62d 100644
--- a/Documentation/trace/fprobetrace.rst
+++ b/Documentation/trace/fprobetrace.rst
@@ -60,6 +60,8 @@ Synopsis of fprobe-events
   (STRUCT)FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
                   a pointer to STRUCT and then derference the pointer defined by
                   ->MEMBER.
+  (STRUCT)(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
+                 also be used with another FETCHARG instead of FIELD.
 
   (\*1) This is available only when BTF is enabled.
   (\*2) only for the probe on function entry (offs == 0). Note, this argument access
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index f73614997d52..c4382765d5b2 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -65,6 +65,8 @@ Synopsis of kprobe_events
                   a pointer to STRUCT and then derference the pointer defined by
                   ->MEMBER. Note that this is available only when the probe is
 		   on function entry.
+  (STRUCT)(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
+                 also be used with another FETCHARG instead of FIELD.
 
   (\*1) only for the probe on function entry (offs == 0). Note, this argument access
         is best effort, because depending on the argument type, it may be passed on
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index aa93e7b01146..4f70318918c2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4326,6 +4326,7 @@ static const char readme_msg[] =
 	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
 #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
 	"\t           [(structname)]<argname>[->field[->field|.field...]],\n"
+	"\t           [(structname)](fetcharg)->field[->field|.field...],\n"
 #endif
 #else
 	"\t           $stack<index>, $stack, $retval, $comm,\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 9158f1f22a62..adcb5a19e72d 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -832,10 +832,35 @@ static int query_btf_struct(const char *sname, struct traceprobe_parse_context *
 	return 0;
 }
 
+/* Find the matching closing parenthesis for a given opening parenthesis. */
+static char *find_matched_close_paren(char *s)
+{
+	char *p = s;
+	int count = 0;
+
+	while (*p) {
+		if (*p == '(')
+			count++;
+		else if (*p == ')') {
+			if (--count == 0)
+				return p;
+		}
+		p++;
+	}
+	return NULL;
+}
+
+static int
+parse_probe_arg(char *arg, const struct fetch_type *type,
+		struct fetch_insn **pcode, struct fetch_insn *end,
+		struct traceprobe_parse_context *ctx);
+
 static int handle_typecast(char *arg, struct fetch_insn **pcode,
 			   struct fetch_insn *end,
 			   struct traceprobe_parse_context *ctx)
 {
+	int orig_offset = ctx->offset;
+	bool nested = false;
 	char *tmp;
 	int ret;
 
@@ -852,19 +877,56 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
 				    DEREF_OPEN_BRACE);
 		return -EINVAL;
 	}
-	*tmp = '\0';
-	ret = query_btf_struct(arg + 1, ctx);
-	*tmp = ')';
+	*tmp++ = '\0';
+
+	/* Handle the nested structure like (STRUCT)(VAR->FIELD)->... */
+	if (*tmp == '(') {
+		char *close = find_matched_close_paren(tmp);
+
+		ctx->offset += tmp - arg;
+		if (!close) {
+			trace_probe_log_err(ctx->offset, DEREF_OPEN_BRACE);
+			return -EINVAL;
+		}
+		/* We expect a field access for typecast */
+		if (close[1] != '-' || close[2] != '>') {
+			trace_probe_log_err(ctx->offset + close - tmp + 1,
+					    TYPECAST_REQ_FIELD);
+			return -EINVAL;
+		}
 
+		ctx->nested_level++;
+		if (ctx->nested_level > TRACEPROBE_MAX_NESTED_LEVEL) {
+			trace_probe_log_err(ctx->offset, TOO_MANY_NESTED);
+			return -E2BIG;
+		}
+		*close = '\0';
+
+		ctx->offset += 1;	/* for the '(' */
+		/* We need to parse the nested one */
+		ret = parse_probe_arg(tmp + 1, find_fetch_type(NULL, ctx->flags),
+				pcode, end, ctx);
+		if (ret < 0)
+			return ret;
+		ctx->nested_level--;
+		clear_struct_btf(ctx);
+
+		tmp = close + 3;/* Skip "->" after closing parenthesis */
+		nested = true;
+	}
+
+	ret = query_btf_struct(arg + 1, ctx);
 	if (ret < 0) {
-		trace_probe_log_err(ctx->offset + 1, NO_PTR_STRCT);
+		trace_probe_log_err(orig_offset + 1, NO_PTR_STRCT);
 		return -EINVAL;
 	}
 
-	tmp++;
-
-	ctx->offset += tmp - arg;
-	ret = parse_btf_arg(tmp, pcode, end, ctx);
+	ctx->offset = orig_offset + tmp - arg;
+	/* If it is nested, tmp points to the field name. */
+	if (nested)
+		ret = parse_btf_field(tmp, ctx->last_struct, pcode, end, ctx);
+	else
+		ret = parse_btf_arg(tmp, pcode, end, ctx);
 	return ret;
 }
 
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 883938a74aee..982d32a5df8b 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -435,8 +435,11 @@ struct traceprobe_parse_context {
 	struct trace_probe *tp;
 	unsigned int flags;
 	int offset;
+	int nested_level;
 };
 
+#define TRACEPROBE_MAX_NESTED_LEVEL 3
+
 extern int traceprobe_parse_probe_arg(struct trace_probe *tp, int i,
 				      const char *argv,
 				      struct traceprobe_parse_context *ctx);
@@ -571,7 +574,9 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
 	C(TOO_MANY_ARGS,	"Too many arguments are specified"),	\
 	C(TOO_MANY_EARGS,	"Too many entry arguments specified"),	\
 	C(EVENT_TOO_BIG,	"Event too big (too many fields?)"),  \
-	C(TYPECAST_NOT_EVENT,	"Typecasts are only for eprobe fields"),
+	C(TYPECAST_NOT_EVENT,	"Typecasts are only for eprobe fields"), \
+	C(TYPECAST_REQ_FIELD,	"Typecast requires a field access"),	\
+	C(TOO_MANY_NESTED,	"Too many nested typecasts/dereferences"),
 
 #undef C
 #define C(a, b)		TP_ERR_##a


^ permalink raw reply related

* [PATCH v4 2/7] tracing/probes: Support typecast for various probe events
From: Masami Hiramatsu (Google) @ 2026-06-15  1:14 UTC (permalink / raw)
  To: Steven Rostedt, Mathieu Desnoyers
  Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178148603548.185520.3389196102475741865.stgit@devnote2>

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Support BTF typecast feature on other probe events, but only if it is
kernel function entry or return, and must use function parameter name
or $retval. This means you can do:

  (STRUCT)PARAM->MEMBER

but you can not do (this should be enabled by nesting support)

  (STRUCT)%reg->MEMBER

To support other probe events, we just need to use last_struct type
when we find a function parameter in parse_btf_arg().

This also update <tracefs>/README file to show struct typecast.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v3:
  - Clarify the limitation.
 Changes in v2:
  - Fix to re-enable typecast on eprobe.
---
 Documentation/trace/fprobetrace.rst |    3 +++
 Documentation/trace/kprobetrace.rst |    4 ++++
 kernel/trace/trace.c                |    2 +-
 kernel/trace/trace_probe.c          |   14 +++++++++-----
 kernel/trace/trace_probe.h          |    5 +++++
 5 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/Documentation/trace/fprobetrace.rst b/Documentation/trace/fprobetrace.rst
index b4c2ca3d02c1..7435ded2d66d 100644
--- a/Documentation/trace/fprobetrace.rst
+++ b/Documentation/trace/fprobetrace.rst
@@ -57,6 +57,9 @@ Synopsis of fprobe-events
                   (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
                   (x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr"
                   and bitfield are supported.
+  (STRUCT)FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
+                  a pointer to STRUCT and then derference the pointer defined by
+                  ->MEMBER.
 
   (\*1) This is available only when BTF is enabled.
   (\*2) only for the probe on function entry (offs == 0). Note, this argument access
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index 3b6791c17e9b..f73614997d52 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -61,6 +61,10 @@ Synopsis of kprobe_events
 		  (x8/x16/x32/x64), VFS layer common type(%pd/%pD), "char",
                   "string", "ustring", "symbol", "symstr" and bitfield are
                   supported.
+  (STRUCT)FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
+                  a pointer to STRUCT and then derference the pointer defined by
+                  ->MEMBER. Note that this is available only when the probe is
+		   on function entry.
 
   (\*1) only for the probe on function entry (offs == 0). Note, this argument access
         is best effort, because depending on the argument type, it may be passed on
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6eb4d3097a4d..aa93e7b01146 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4325,7 +4325,7 @@ static const char readme_msg[] =
 #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
 	"\t           $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
 #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
-	"\t           <argname>[->field[->field|.field...]],\n"
+	"\t           [(structname)]<argname>[->field[->field|.field...]],\n"
 #endif
 #else
 	"\t           $stack<index>, $stack, $retval, $comm,\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index fd1caa1f9723..9158f1f22a62 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -759,7 +759,10 @@ static int parse_btf_arg(char *varname,
 	return -ENOENT;
 
 found:
-	type = btf_type_skip_modifiers(ctx->btf, tid, &tid);
+	if (ctx->struct_btf)
+		type = ctx->last_struct;
+	else
+		type = btf_type_skip_modifiers(ctx->btf, tid, &tid);
 found_type:
 	if (!type) {
 		trace_probe_log_err(ctx->offset, BAD_BTF_TID);
@@ -836,10 +839,11 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
 	char *tmp;
 	int ret;
 
-	/* Currently this only works for eprobes */
-	if (!(ctx->flags & TPARG_FL_TEVENT)) {
-		trace_probe_log_err(ctx->offset, TYPECAST_NOT_EVENT);
-		return -EINVAL;
+	if (!(tparg_is_event_probe(ctx->flags) ||
+	      tparg_is_function_entry(ctx->flags) ||
+	      tparg_is_function_return(ctx->flags))) {
+		trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
+		return -EOPNOTSUPP;
 	}
 
 	tmp = strchr(arg, ')');
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 15758cc11fc6..883938a74aee 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -414,6 +414,11 @@ static inline bool tparg_is_function_return(unsigned int flags)
 	return (flags & TPARG_FL_LOC_MASK) == (TPARG_FL_KERNEL | TPARG_FL_RETURN);
 }
 
+static inline bool tparg_is_event_probe(unsigned int flags)
+{
+	return !!(flags & TPARG_FL_TEVENT);
+}
+
 struct traceprobe_parse_context {
 	struct trace_event_call *event;
 	/* BTF related parameters */


^ permalink raw reply related

* [PATCH v4 1/7] tracing/events: Fix to check the simple_tsk_fn creation
From: Masami Hiramatsu (Google) @ 2026-06-15  1:14 UTC (permalink / raw)
  To: Steven Rostedt, Mathieu Desnoyers
  Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178148603548.185520.3389196102475741865.stgit@devnote2>

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Sashiko pointed that this sample code does not correctly handle the
failure of thread creation because kthread_run() can return -errno.

Check the simple_tsk_fn is correctly initialized (created) or not.

Link: https://sashiko.dev/#/patchset/178092865666.163648.10457567771536160909.stgit%40devnote2

Fixes: 9cfe06f8cd5c ("tracing/events: add trace-events-sample")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v4:
   - Fix to remove decrementing counter in error path, since foo_bar_reg() always returns 0.
   - Add a newline to error message.
 Changes in v3:
   - Recover the usage counter.
---
 samples/trace_events/trace-events-sample.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/samples/trace_events/trace-events-sample.c b/samples/trace_events/trace-events-sample.c
index ecc7db237f2e..0b7a6efdb247 100644
--- a/samples/trace_events/trace-events-sample.c
+++ b/samples/trace_events/trace-events-sample.c
@@ -107,6 +107,10 @@ int foo_bar_reg(void)
 	 * for consistency sake, we still take the thread_mutex.
 	 */
 	simple_tsk_fn = kthread_run(simple_thread_fn, NULL, "event-sample-fn");
+	if (IS_ERR_OR_NULL(simple_tsk_fn)) {
+		pr_err("Failed to create simple_thread_fn\n");
+		simple_tsk_fn = NULL;
+	}
  out:
 	mutex_unlock(&thread_mutex);
 	return 0;


^ permalink raw reply related

* [PATCH v4 0/7] tracing/probes: Add more typecast features
From: Masami Hiramatsu (Google) @ 2026-06-15  1:13 UTC (permalink / raw)
  To: Steven Rostedt, Mathieu Desnoyers
  Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest

Hi,

Here is the 4th version of series to introduce more typecast features
to probe events. The previous version is here:

 https://lore.kernel.org/all/178144880282.159464.16882854283219530040.stgit@devnote2/

In this version, I fixed some issues found by Sashiko reviews.

Steve introduced BTF typecast feature for eprobe[1].
This series extends it and add more options:

1. Expanding BTF typecast to kprobe and fprobe.
   (currently only function entry/exit)

2. Introduce container_of like typecast. This adds a "assigned
   member" option to the typecast.

   (STRUCT,MEMBER)VAR->ANOTHER_MEMBER

   This casts VAR to STRUCT type but the VAR is as the address
   of STRUCT.MEMBER. In C, it is:

   container_of(VAR, STRUCT, MEMBER)->ANOTHER_MEMBER

3. Support nested typecast, e.g.

   (STRUCT)((STRUCT2)VAR->MEMBER2)->MEMBER

   the nest level must be smaller than 3.

4. Add $current variable to point "current" task_struct.
   This is useful with typecast, e.g.

   (task_struct)$current->pid

5. per-cpu dereference support.

   Intrdouce this_cpu_read(VAR) and this_cpu_ptr(VAR) to
   access per-cpu data on the current CPU (accessing other CPU
   data is not stable, because it can be changed.)

   You can access the member of per-cpu data structure using
   typecast like:

   (STRUCT)this_cpu_ptr(VAR)->MEMBER


And added a test script to test part of them.

[1] https://lore.kernel.org/all/20260601130746.2139d926@gandalf.local.home/


---

Masami Hiramatsu (Google) (7):
      tracing/events: Fix to check the simple_tsk_fn creation
      tracing/probes: Support typecast for various probe events
      tracing/probes: Support nested typecast
      tracing/probes: Support field specifier option for typecast
      tracing/probes: Add $current variable support
      tracing/probes: Add this_cpu_read() and this_cpu_ptr() dereference method to fetcharg
      tracing/probes: Add a new testcase for BTF typecasts


 Documentation/trace/eprobetrace.rst                |    9 
 Documentation/trace/fprobetrace.rst                |   10 
 Documentation/trace/kprobetrace.rst                |   11 +
 kernel/trace/trace.c                               |    8 
 kernel/trace/trace_probe.c                         |  413 +++++++++++++++-----
 kernel/trace/trace_probe.h                         |   19 +
 kernel/trace/trace_probe_tmpl.h                    |   33 +-
 samples/trace_events/trace-events-sample.c         |   44 ++
 samples/trace_events/trace-events-sample.h         |   34 ++
 .../ftrace/test.d/dynevent/btf_probe_event.tc      |   51 ++
 .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc |    9 
 .../ftrace/test.d/kprobe/kprobe_syntax_errors.tc   |    9 
 .../ftrace/test.d/kprobe/uprobe_syntax_errors.tc   |    5 
 13 files changed, 540 insertions(+), 115 deletions(-)
 create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc

--
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* [PATCH v3 7/7] tracing/probes: Add a new testcase for BTF typecasts
From: Masami Hiramatsu (Google) @ 2026-06-14 14:54 UTC (permalink / raw)
  To: Steven Rostedt, Mathieu Desnoyers
  Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178144880282.159464.16882854283219530040.stgit@devnote2>

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

With the introduction of container_of-style BTF typecasting and
per-CPU variable access support in trace probes, we need a way to
verify their functionality and prevent regressions.

Add a new ftrace kselftest and update the trace event sample module
to test and validate these features.

Specifically, update the trace-events-sample module to set up a
periodic timer whose callback accesses a per-CPU counter. Introduce
a new sample trace event, foo_timer_fn, to trace this callback
and log the current counter value.

Then, add a new test case, btf_probe_event.tc, which defines a
dynamic probe on the timer callback. The probe uses BTF typecasting
to recover the parent structure from the timer argument and
this_cpu_read() to fetch the per-CPU counter. The test verifies
the integrity of the implementation by ensuring the values
recorded by the dynamic probe match those from the static tracepoint.

Assisted-by: Antigravity:gemini-3.5-flash
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v3:
  - Add syntax test case.
  - Update testcase to use this_cpu_read()
 Changes in v2:
  - Use timer_shutdown_sync() instead of timer_delete_sync() for teardown.
---
 samples/trace_events/trace-events-sample.c         |   40 +++++++++++++++-
 samples/trace_events/trace-events-sample.h         |   34 ++++++++++++-
 .../ftrace/test.d/dynevent/btf_probe_event.tc      |   51 ++++++++++++++++++++
 .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc |    9 ++++
 .../ftrace/test.d/kprobe/kprobe_syntax_errors.tc   |    9 ++++
 .../ftrace/test.d/kprobe/uprobe_syntax_errors.tc   |    5 ++
 6 files changed, 143 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc

diff --git a/samples/trace_events/trace-events-sample.c b/samples/trace_events/trace-events-sample.c
index 82344a78e471..651f3e2138ab 100644
--- a/samples/trace_events/trace-events-sample.c
+++ b/samples/trace_events/trace-events-sample.c
@@ -94,6 +94,20 @@ static int simple_thread_fn(void *arg)
 static DEFINE_MUTEX(thread_mutex);
 static int simple_thread_cnt;
 
+static struct foo_timer_data *foo_timer_data;
+
+static void sample_timer_cb(struct timer_list *t)
+{
+	struct foo_timer_data *data = container_of(t, struct foo_timer_data, timer);
+
+	get_cpu();
+	trace_foo_timer_fn(data);
+	(*this_cpu_ptr(data->counter))++;
+	put_cpu();
+
+	mod_timer(t, jiffies + HZ);
+}
+
 int foo_bar_reg(void)
 {
 	mutex_lock(&thread_mutex);
@@ -133,9 +147,27 @@ void foo_bar_unreg(void)
 
 static int __init trace_event_init(void)
 {
+	foo_timer_data = kzalloc_obj(*foo_timer_data, GFP_KERNEL);
+	if (!foo_timer_data)
+		return -ENOMEM;
+
+	foo_timer_data->name = "sample_timer_counter";
+	foo_timer_data->counter = alloc_percpu(int);
+	if (!foo_timer_data->counter) {
+		kfree(foo_timer_data);
+		return -ENOMEM;
+	}
+
+	timer_setup(&foo_timer_data->timer, sample_timer_cb, 0);
+	mod_timer(&foo_timer_data->timer, jiffies + HZ);
+
 	simple_tsk = kthread_run(simple_thread, NULL, "event-sample");
-	if (IS_ERR(simple_tsk))
-		return -1;
+	if (IS_ERR(simple_tsk)) {
+		timer_shutdown_sync(&foo_timer_data->timer);
+		free_percpu(foo_timer_data->counter);
+		kfree(foo_timer_data);
+		return PTR_ERR(simple_tsk);
+	}
 
 	return 0;
 }
@@ -148,6 +180,10 @@ static void __exit trace_event_exit(void)
 		kthread_stop(simple_tsk_fn);
 	simple_tsk_fn = NULL;
 	mutex_unlock(&thread_mutex);
+
+	timer_shutdown_sync(&foo_timer_data->timer);
+	free_percpu(foo_timer_data->counter);
+	kfree(foo_timer_data);
 }
 
 module_init(trace_event_init);
diff --git a/samples/trace_events/trace-events-sample.h b/samples/trace_events/trace-events-sample.h
index 1a05fc153353..816848a456a2 100644
--- a/samples/trace_events/trace-events-sample.h
+++ b/samples/trace_events/trace-events-sample.h
@@ -247,12 +247,14 @@
  */
 
 /*
- * It is OK to have helper functions in the file, but they need to be protected
- * from being defined more than once. Remember, this file gets included more
- * than once.
+ * It is OK to have helper functions and data structures in the file, but they
+ * need to be protected from being defined more than once. Remember, this file
+ * gets included more than once.
  */
 #ifndef __TRACE_EVENT_SAMPLE_HELPER_FUNCTIONS
 #define __TRACE_EVENT_SAMPLE_HELPER_FUNCTIONS
+#include <linux/timer.h>
+
 static inline int __length_of(const int *list)
 {
 	int i;
@@ -270,6 +272,13 @@ enum {
 	TRACE_SAMPLE_BAR = 4,
 	TRACE_SAMPLE_ZOO = 8,
 };
+
+struct foo_timer_data {
+	const char		*name;
+	struct timer_list	timer;
+	int __percpu		*counter;
+};
+
 #endif
 
 /*
@@ -595,6 +604,25 @@ TRACE_EVENT(foo_rel_loc,
 		  __get_rel_bitmask(bitmask),
 		  __get_rel_cpumask(cpumask))
 );
+
+TRACE_EVENT(foo_timer_fn,
+
+	TP_PROTO(struct foo_timer_data *data),
+
+	TP_ARGS(data),
+
+	TP_STRUCT__entry(
+		__string(	name,			data->name	)
+		__field(	int,			count		)
+	),
+
+	TP_fast_assign(
+		__assign_str(name);
+		__entry->count	= *this_cpu_ptr(data->counter);
+	),
+
+	TP_printk("name=%s count=%d", __get_str(name), __entry->count)
+);
 #endif
 
 /***** NOTICE! The #if protection ends here. *****/
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc b/tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc
new file mode 100644
index 000000000000..96791e120b7d
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc
@@ -0,0 +1,51 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: BTF event with typecast and percpu access
+# requires: dynamic_events "this_cpu_read(<fetcharg>)":README "[(structname[,field])]<argname>[->field[->field|.field...]]":README
+
+# Check if the sample module is loaded
+if ! lsmod | grep -q trace_events_sample; then
+  modprobe trace-events-sample || exit_unsupported
+fi
+
+echo 0 > events/enable
+echo > dynamic_events
+
+# The sample_timer_cb(struct timer_list *t) is called.
+# We want to check (STRUCT,FIELD)VAR typecast and this_cpu_read() access.
+# (foo_timer_data,timer)t converts t to struct foo_timer_data * using container_of.
+# data->counter is a per-cpu pointer to int.
+# this_cpu_read(data->counter) should give the value of the counter.
+
+echo 'f:mysample/myevent sample_timer_cb name=(foo_timer_data,timer)t->name:string count=this_cpu_read((foo_timer_data,timer)t->counter)' >> dynamic_events
+
+echo 1 > events/mysample/myevent/enable
+echo 1 > events/sample-trace/foo_timer_fn/enable
+
+sleep 2
+
+echo 0 > events/mysample/myevent/enable
+echo 0 > events/sample-trace/foo_timer_fn/enable
+
+# Compare the values.
+MATCH=0
+while read line; do
+  if echo $line | grep -q "foo_timer_fn:"; then
+    NAME=`echo $line | sed 's/.*name=\([^ ]*\) .*/\1/'`
+    COUNT=`echo $line | sed 's/.*count=\([^ ]*\).*/\1/'`
+    if grep -q "myevent:.*name=\"${NAME}\" count=$COUNT" trace; then
+       MATCH=$((MATCH+1))
+    fi
+  fi
+done < trace
+
+if [ $MATCH -eq 0 ]; then
+  echo "No matching events found"
+  exit_fail
+fi
+
+# Clean up
+echo 0 > events/mysample/myevent/enable
+echo 0 > events/sample-trace/foo_timer_fn/enable
+echo > dynamic_events
+clear_trace
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
index fee479295e2f..b781ec07c0d0 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
@@ -112,6 +112,15 @@ check_error 'f vfs_read%return $retval->^foo'	# NO_PTR_STRCT
 check_error 'f vfs_read file->^foo'		# NO_BTF_FIELD
 check_error 'f vfs_read file^-.foo'		# BAD_HYPHEN
 check_error 'f vfs_read ^file:string'		# BAD_TYPE4STR
+if grep -qF "[(structname" README ; then
+check_error 'f vfs_read arg1=(task_struct)file^'		# TYPECAST_REQ_FIELD
+check_error 'f vfs_read arg1=(f)((f)((f)((f)^((f)file->f)->f)->f)->f)->f'	# TOO_MANY_NESTED
+check_error 'f vfs_read arg1=(task_struct,^in_execve)file->comm'	# TYPECAST_NOT_ALIGNED
+check_error 'f vfs_read arg1=(task_struct,se^->group_node)file->comm'	# TYPECAST_BAD_ARROW
+check_error 'f vfs_read arg1=(task_struct,^->pid)file->comm'	# TYPECAST_BAD_ARROW
+check_error 'f vfs_read arg1=(task_struct,^.pid)file->comm'	# BTF_BAD_TID
+check_error 'f vfs_read arg1=(task_struct,^.)file->comm'	# BTF_BAD_TID
+fi
 fi
 
 else
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
index 8f1c58f0c239..78f015c8c010 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
@@ -115,6 +115,15 @@ check_error 'p vfs_read+20 ^$arg*'		# NOFENTRY_ARGS
 check_error 'p vfs_read ^hoge'			# NO_BTFARG
 check_error 'p kfree ^$arg10'			# NO_BTFARG (exceed the number of parameters)
 check_error 'r kfree ^$retval'			# NO_RETVAL
+if grep -qF "[(structname" README ; then
+check_error 'p vfs_read arg1=(task_struct)file^'		# TYPECAST_REQ_FIELD
+check_error 'p vfs_read arg1=(f)((f)((f)((f)^((f)file->f)->f)->f)->f)->f'	# TOO_MANY_NESTED
+check_error 'p vfs_read arg1=(task_struct,^in_execve)file->comm'	# TYPECAST_NOT_ALIGNED
+check_error 'p vfs_read arg1=(task_struct,se^->group_node)file->comm'	# TYPECAST_BAD_ARROW
+check_error 'p vfs_read arg1=(task_struct,^->pid)file->comm'	# TYPECAST_BAD_ARROW
+check_error 'p vfs_read arg1=(task_struct,^.pid)file->comm'	# BTF_BAD_TID
+check_error 'p vfs_read arg1=(task_struct,^.)file->comm'	# BTF_BAD_TID
+fi
 else
 check_error 'p vfs_read ^$arg*'			# NOSUP_BTFARG
 fi
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc
index c817158b99db..d5d5245bee6c 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc
@@ -28,4 +28,9 @@ if grep -q ".*symstr.*" README; then
 check_error 'p /bin/sh:10 $stack0:^symstr'	# BAD_TYPE
 fi
 
+# $current is not supported by uprobe
+if grep -q "\$current.*" README; then
+check_error 'p /bin/sh:10 $current:^u8'	# BAD_VAR
+fi
+
 exit 0


^ permalink raw reply related

* [PATCH v3 6/7] tracing/probes: Add this_cpu_read() and this_cpu_ptr() dereference method to fetcharg
From: Masami Hiramatsu (Google) @ 2026-06-14 14:54 UTC (permalink / raw)
  To: Steven Rostedt, Mathieu Desnoyers
  Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178144880282.159464.16882854283219530040.stgit@devnote2>

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

When tracing the kernel local variables, sometimes we need to get the
CPU local variables. To access it, current simple dereference is not
enough.

Thus, introduce a special this_cpu_read() dereference to access per-cpu
variable for the current CPU (accessing other CPU variable may race with
updates on other CPUs). Also this_cpu_ptr() is for accessing per-cpu
pointer.

Those are working as same as the kernel percpu macro.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v3:
  - Remove NULL check for percpu var because it is just an offset, could be 0.
  - Simplify process_fetch_insn_bottom() code.
  - If the last operation is this_cpu_read(), read only memory of the specific
    size (of type).
 Changes in v2:
  - Drop +CPU/+PCPU and introduce this_cpu_read() and this_cpu_ptr().
  - Support these method with BTF typecast.
  - Just check the base address is NOT NULL instead of is_kernel_percpu_address().
---
 Documentation/trace/eprobetrace.rst |    2 +
 Documentation/trace/fprobetrace.rst |    2 +
 Documentation/trace/kprobetrace.rst |    2 +
 kernel/trace/trace.c                |    1 
 kernel/trace/trace_probe.c          |  138 ++++++++++++++++++++++++-----------
 kernel/trace/trace_probe.h          |    3 +
 kernel/trace/trace_probe_tmpl.h     |   30 ++++++--
 7 files changed, 129 insertions(+), 49 deletions(-)

diff --git a/Documentation/trace/eprobetrace.rst b/Documentation/trace/eprobetrace.rst
index 680e0af43d5d..279396951b34 100644
--- a/Documentation/trace/eprobetrace.rst
+++ b/Documentation/trace/eprobetrace.rst
@@ -39,6 +39,8 @@ Synopsis of eprobe_events
   @SYM[+|-offs]	: Fetch memory at SYM +|- offs (SYM should be a data symbol)
   $comm		: Fetch current task comm.
   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
+  this_cpu_read(FETCHARG) : Read the value of the per-CPU variable FETCHARG on the current CPU.
+  this_cpu_ptr(FETCHARG) : Get the address of the per-CPU variable FETCHARG on the current CPU.
   \IMM		: Store an immediate value to the argument.
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
diff --git a/Documentation/trace/fprobetrace.rst b/Documentation/trace/fprobetrace.rst
index 3392cab016b3..3439bc9bd351 100644
--- a/Documentation/trace/fprobetrace.rst
+++ b/Documentation/trace/fprobetrace.rst
@@ -52,6 +52,8 @@ Synopsis of fprobe-events
   $comm         : Fetch current task comm.
   $current      : Fetch the address of the current task_struct.
   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5)
+  this_cpu_read(FETCHARG) : Read the value of the per-CPU variable FETCHARG on the current CPU.
+  this_cpu_ptr(FETCHARG) : Get the address of the per-CPU variable FETCHARG on the current CPU.
   \IMM          : Store an immediate value to the argument.
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index 81e4fe38791d..9ae330eb0a52 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -55,6 +55,8 @@ Synopsis of kprobe_events
   $comm		: Fetch current task comm.
   $current      : Fetch the address of the current task_struct.
   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
+  this_cpu_read(FETCHARG) : Read the value of the per-CPU variable FETCHARG on the current CPU.
+  this_cpu_ptr(FETCHARG) : Get the address of the per-CPU variable FETCHARG on the current CPU.
   \IMM		: Store an immediate value to the argument.
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e185a006cb08..1d5d6e46dc4d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4332,6 +4332,7 @@ static const char readme_msg[] =
 	"\t           $stack<index>, $stack, $retval, $comm, $current\n"
 #endif
 	"\t           +|-[u]<offset>(<fetcharg>), \\imm-value, \\\"imm-string\"\n"
+	"\t           this_cpu_read(<fetcharg>), this_cpu_ptr(<fetcharg>)\n"
 	"\t     kernel return probes support: $retval, $arg<N>, $comm\n"
 	"\t     type: s8/16/32/64, u8/16/32/64, x8/16/32/64, char, string, symbol,\n"
 	"\t           b<bit-width>@<bit-offset>/<container-size>, ustring,\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 017f30ae9def..61bd65575f64 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -349,6 +349,77 @@ static int parse_trace_event(char *arg, struct fetch_insn *code,
 	return -EINVAL;
 }
 
+/* this_cpu_* parser */
+#define THIS_CPU_PTR_PREFIX "this_cpu_ptr("
+#define THIS_CPU_READ_PREFIX "this_cpu_read("
+#define THIS_CPU_PTR_LEN (sizeof(THIS_CPU_PTR_PREFIX) - 1)
+#define THIS_CPU_READ_LEN (sizeof(THIS_CPU_READ_PREFIX) - 1)
+
+static int
+parse_probe_arg(char *arg, const struct fetch_type *type,
+		struct fetch_insn **pcode, struct fetch_insn *end,
+		struct traceprobe_parse_context *ctx);
+
+/* handle dereference nested call */
+static inline int handle_dereference(char *arg, struct fetch_insn **pcode,
+	struct fetch_insn *end, struct traceprobe_parse_context *ctx,
+	int deref, long offset)
+{
+	const struct fetch_type *type = find_fetch_type(NULL, ctx->flags);
+	struct fetch_insn *code = *pcode;
+	int cur_offs = ctx->offset;
+	char *tmp;
+	int ret;
+
+	tmp = strrchr(arg, ')');
+	if (!tmp) {
+		trace_probe_log_err(ctx->offset + strlen(arg),
+					DEREF_OPEN_BRACE);
+		return -EINVAL;
+	}
+
+	*tmp = '\0';
+	ret = parse_probe_arg(arg, type, &code, end, ctx);
+	if (ret)
+		return ret;
+	ctx->offset = cur_offs;
+	if (code->op == FETCH_OP_COMM || code->op == FETCH_OP_DATA) {
+		trace_probe_log_err(ctx->offset, COMM_CANT_DEREF);
+		return -EINVAL;
+	}
+	code++;
+	if (code == end) {
+		trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
+		return -EINVAL;
+	}
+	*pcode = code;
+
+	code->op = deref;
+	code->offset = offset;
+	/* Reset the last type if used */
+	ctx->last_type = NULL;
+	return 0;
+}
+
+static int parse_this_cpu(char *arg, struct fetch_insn **pcode,
+			  struct fetch_insn *end,
+			  struct traceprobe_parse_context *ctx)
+{
+	int deref;
+
+	if (str_has_prefix(arg, THIS_CPU_PTR_PREFIX)) {
+		arg += THIS_CPU_PTR_LEN;
+		ctx->offset += THIS_CPU_PTR_LEN;
+		deref = FETCH_OP_CPU_PTR;
+	} else if (str_has_prefix(arg, THIS_CPU_READ_PREFIX)) {
+		arg += THIS_CPU_READ_LEN;
+		ctx->offset += THIS_CPU_READ_LEN;
+		deref = FETCH_OP_DEREF_CPU;
+	} else
+		return -EINVAL;
+	return handle_dereference(arg, pcode, end, ctx, deref, 0);
+}
+
 #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
 
 static u32 btf_type_int(const struct btf_type *t)
@@ -929,11 +1000,6 @@ static char *find_matched_close_paren(char *s)
 	return NULL;
 }
 
-static int
-parse_probe_arg(char *arg, const struct fetch_type *type,
-		struct fetch_insn **pcode, struct fetch_insn *end,
-		struct traceprobe_parse_context *ctx);
-
 static int handle_typecast(char *arg, struct fetch_insn **pcode,
 			   struct fetch_insn *end,
 			   struct traceprobe_parse_context *ctx)
@@ -959,7 +1025,8 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
 	*tmp++ = '\0';
 
 	/* Handle the nested structure like (STRUCT)(VAR->FIELD)->... */
-	if (*tmp == '(') {
+	if (*tmp == '(' || str_has_prefix(tmp, THIS_CPU_PTR_PREFIX) ||
+	    str_has_prefix(tmp, THIS_CPU_READ_PREFIX)) {
 		char *close = find_matched_close_paren(tmp);
 
 		ctx->offset += tmp - arg;
@@ -979,12 +1046,18 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
 			trace_probe_log_err(ctx->offset, TOO_MANY_NESTED);
 			return -E2BIG;
 		}
-		*close = '\0';
 
-		ctx->offset += 1;	/* for the '(' */
-		/* We need to parse the nested one */
-		ret = parse_probe_arg(tmp + 1, find_fetch_type(NULL, ctx->flags),
-				pcode, end, ctx);
+		if (*tmp == '(') {
+			/* Extract the inner argument */
+			*close = '\0';
+			ctx->offset += 1;/* for the '(' */
+			/* Parse the nested one */
+			ret = parse_probe_arg(tmp + 1, find_fetch_type(NULL, ctx->flags),
+					pcode, end, ctx);
+		} else {
+			/* this_cpu_* will be parsed in parse_this_cpu() */
+			ret = parse_this_cpu(tmp, pcode, end, ctx);
+		}
 		if (ret < 0)
 			return ret;
 		ctx->nested_level--;
@@ -1454,36 +1527,9 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 		}
 		ctx->offset += (tmp + 1 - arg) + (arg[0] != '-' ? 1 : 0);
 		arg = tmp + 1;
-		tmp = strrchr(arg, ')');
-		if (!tmp) {
-			trace_probe_log_err(ctx->offset + strlen(arg),
-					    DEREF_OPEN_BRACE);
-			return -EINVAL;
-		} else {
-			const struct fetch_type *t2 = find_fetch_type(NULL, ctx->flags);
-			int cur_offs = ctx->offset;
-
-			*tmp = '\0';
-			ret = parse_probe_arg(arg, t2, &code, end, ctx);
-			if (ret)
-				break;
-			ctx->offset = cur_offs;
-			if (code->op == FETCH_OP_COMM ||
-			    code->op == FETCH_OP_DATA) {
-				trace_probe_log_err(ctx->offset, COMM_CANT_DEREF);
-				return -EINVAL;
-			}
-			if (++code == end) {
-				trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
-				return -EINVAL;
-			}
-			*pcode = code;
-
-			code->op = deref;
-			code->offset = offset;
-			/* Reset the last type if used */
-			ctx->last_type = NULL;
-		}
+		ret = handle_dereference(arg, pcode, end, ctx, deref, offset);
+		if (ret < 0)
+			return ret;
 		break;
 	case '\\':	/* Immediate value */
 		if (arg[1] == '"') {	/* Immediate string */
@@ -1504,15 +1550,18 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 		ret = handle_typecast(arg, pcode, end, ctx);
 		break;
 	default:
-		if (isalpha(arg[0]) || arg[0] == '_') {	/* BTF variable */
+		if (str_has_prefix(arg, THIS_CPU_PTR_PREFIX) ||
+		    str_has_prefix(arg, THIS_CPU_READ_PREFIX)) {
+			ret = parse_this_cpu(arg, pcode, end, ctx);
+		} else if (isalpha(arg[0]) || arg[0] == '_') {	/* BTF variable */
 			if (!tparg_is_function_entry(ctx->flags) &&
 			    !tparg_is_function_return(ctx->flags)) {
 				trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
 				return -EINVAL;
 			}
 			ret = parse_btf_arg(arg, pcode, end, ctx);
-			break;
 		}
+		break;
 	}
 	if (!ret && code->op == FETCH_OP_NOP) {
 		/* Parsed, but do not find fetch method */
@@ -1687,6 +1736,9 @@ static int finalize_fetch_insn(struct fetch_insn *code,
 	} else if (code->op == FETCH_OP_UDEREF) {
 		code->op = FETCH_OP_ST_UMEM;
 		code->size = parg->type->size;
+	} else if (code->op == FETCH_OP_DEREF_CPU) {
+		code->op = FETCH_OP_ST_CPUMEM;
+		code->size = parg->type->size;
 	} else {
 		code++;
 		if (code->op != FETCH_OP_NOP) {
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 62645e847bd1..523612023608 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -100,10 +100,13 @@ enum fetch_op {
 	// Stage 2 (dereference) op
 	FETCH_OP_DEREF,		/* Dereference: .offset */
 	FETCH_OP_UDEREF,	/* User-space Dereference: .offset */
+	FETCH_OP_DEREF_CPU,	/* Per-CPU Dereference for this CPU */
+	FETCH_OP_CPU_PTR,	/* Per-CPU pointer for this CPU */
 	// Stage 3 (store) ops
 	FETCH_OP_ST_RAW,	/* Raw: .size */
 	FETCH_OP_ST_MEM,	/* Mem: .offset, .size */
 	FETCH_OP_ST_UMEM,	/* Mem: .offset, .size */
+	FETCH_OP_ST_CPUMEM,	/* Per-CPU Mem: .size */
 	FETCH_OP_ST_STRING,	/* String: .offset, .size */
 	FETCH_OP_ST_USTRING,	/* User String: .offset, .size */
 	FETCH_OP_ST_SYMSTR,	/* Kernel Symbol String: .offset, .size */
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index f630930288d2..83111c167b74 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -129,25 +129,39 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 	struct fetch_insn *s3 = NULL;
 	int total = 0, ret = 0, i = 0;
 	u32 loc = 0;
-	unsigned long lval = val;
+	unsigned long lval, llval = val;
 
 stage2:
 	/* 2nd stage: dereference memory if needed */
 	do {
-		if (code->op == FETCH_OP_DEREF) {
-			lval = val;
+		lval = val;
+		switch (code->op) {
+		case FETCH_OP_DEREF:
 			ret = probe_mem_read(&val, (void *)val + code->offset,
 					     sizeof(val));
-		} else if (code->op == FETCH_OP_UDEREF) {
-			lval = val;
+			break;
+		case FETCH_OP_UDEREF:
 			ret = probe_mem_read_user(&val,
 				 (void *)val + code->offset, sizeof(val));
-		} else
 			break;
+		case FETCH_OP_DEREF_CPU:
+			val = (unsigned long)this_cpu_ptr((void __percpu *)val);
+			ret = probe_mem_read(&val, (void *)val, sizeof(val));
+			break;
+		case FETCH_OP_CPU_PTR:
+			val = (unsigned long)this_cpu_ptr((void __percpu *)val);
+			ret = 0;
+			break;
+		default:
+			lval = llval;
+			goto out;
+		}
 		if (ret)
 			return ret;
+		llval = lval;
 		code++;
 	} while (1);
+out:
 
 	s3 = code;
 stage3:
@@ -181,6 +195,10 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
 	case FETCH_OP_ST_UMEM:
 		probe_mem_read_user(dest, (void *)val + code->offset, code->size);
 		break;
+	case FETCH_OP_ST_CPUMEM:
+		val = (unsigned long)this_cpu_ptr((void __percpu *)val);
+		probe_mem_read(dest, (void *)val, code->size);
+		break;
 	case FETCH_OP_ST_STRING:
 		loc = *(u32 *)dest;
 		ret = fetch_store_string(val + code->offset, dest, base);


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox