* [PATCH 0/2] perf,coresight: Reduce fragmentation with non-contiguous AUX pages for cs_etm @ 2025-04-21 21:58 Yabin Cui 2025-04-21 21:58 ` [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability Yabin Cui 2025-04-21 21:58 ` [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU Yabin Cui 0 siblings, 2 replies; 14+ messages in thread From: Yabin Cui @ 2025-04-21 21:58 UTC (permalink / raw) To: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan Cc: coresight, linux-arm-kernel, linux-kernel, linux-perf-users, Yabin Cui Hi perf maintainers, Hi coresight maintainers, This patch set (2 patches) addresses memory fragmentation caused by contiguous AUX buffer allocation for the cs_etm PMU on Android. The cs_etm PMU doesn't need contiguous AUX pages, yet perf always allocates contiguous AUX pages based on aux_watermark. So repeated use of cs_etm with large buffers leads to memory fragmentation, negatively impacting other processes. This solution introduces PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES to allow cs_etm to request non-contiguous AUX buffers, avoiding high-order page allocations and reducing fragmentation. This aims to reduce memory fragmentation for Android devices when using cs_etm. Your review is appreciated. Thanks, Yabin Yabin Cui (2): perf: Allow non-contiguous AUX buffer pages via PMU capability coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++- include/linux/perf_event.h | 1 + kernel/events/ring_buffer.c | 6 ++++++ 3 files changed, 9 insertions(+), 1 deletion(-) -- 2.49.0.805.g082f7c87e0-goog ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability 2025-04-21 21:58 [PATCH 0/2] perf,coresight: Reduce fragmentation with non-contiguous AUX pages for cs_etm Yabin Cui @ 2025-04-21 21:58 ` Yabin Cui 2025-04-22 10:21 ` James Clark 2025-04-21 21:58 ` [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU Yabin Cui 1 sibling, 1 reply; 14+ messages in thread From: Yabin Cui @ 2025-04-21 21:58 UTC (permalink / raw) To: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan Cc: coresight, linux-arm-kernel, linux-kernel, linux-perf-users, Yabin Cui For PMUs like ARM ETM/ETE, contiguous AUX buffers are unnecessary and increase memory fragmentation. This patch introduces PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES, allowing PMUs to request non-contiguous pages for their AUX buffers. Signed-off-by: Yabin Cui <yabinc@google.com> --- include/linux/perf_event.h | 1 + kernel/events/ring_buffer.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 0069ba6866a4..26ca35d6a9f2 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -301,6 +301,7 @@ struct perf_event_pmu_context; #define PERF_PMU_CAP_AUX_OUTPUT 0x0080 #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 #define PERF_PMU_CAP_AUX_PAUSE 0x0200 +#define PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES 0x0400 /** * pmu::scope diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 5130b119d0ae..87f42f4e8edc 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -710,6 +710,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, max_order = ilog2(nr_pages); watermark = 0; } + /* + * When the PMU doesn't prefer contiguous AUX buffer pages, favor + * low-order allocations to reduce memory fragmentation. + */ + if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES) + max_order = 0; /* * kcalloc_node() is unable to allocate buffer if the size is larger -- 2.49.0.805.g082f7c87e0-goog ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability 2025-04-21 21:58 ` [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability Yabin Cui @ 2025-04-22 10:21 ` James Clark 2025-04-22 12:49 ` Ingo Molnar 0 siblings, 1 reply; 14+ messages in thread From: James Clark @ 2025-04-22 10:21 UTC (permalink / raw) To: Yabin Cui Cc: coresight, linux-arm-kernel, linux-kernel, linux-perf-users, Suzuki K Poulose, Mike Leach, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan On 21/04/2025 10:58 pm, Yabin Cui wrote: > For PMUs like ARM ETM/ETE, contiguous AUX buffers are unnecessary > and increase memory fragmentation. > > This patch introduces PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES, allowing > PMUs to request non-contiguous pages for their AUX buffers. > > Signed-off-by: Yabin Cui <yabinc@google.com> > --- > include/linux/perf_event.h | 1 + > kernel/events/ring_buffer.c | 6 ++++++ > 2 files changed, 7 insertions(+) > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 0069ba6866a4..26ca35d6a9f2 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -301,6 +301,7 @@ struct perf_event_pmu_context; > #define PERF_PMU_CAP_AUX_OUTPUT 0x0080 > #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 > #define PERF_PMU_CAP_AUX_PAUSE 0x0200 > +#define PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES 0x0400 > > /** > * pmu::scope > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > index 5130b119d0ae..87f42f4e8edc 100644 > --- a/kernel/events/ring_buffer.c > +++ b/kernel/events/ring_buffer.c > @@ -710,6 +710,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, > max_order = ilog2(nr_pages); > watermark = 0; > } > + /* > + * When the PMU doesn't prefer contiguous AUX buffer pages, favor > + * low-order allocations to reduce memory fragmentation. > + */ > + if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES) > + max_order = 0; > > /* > * kcalloc_node() is unable to allocate buffer if the size is larger Hi Yabin, I was wondering if this is just the opposite of PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default for all devices to solve the issue you describe. Because we already have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages. Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order allocations for AUX buffers optimistically") that explains that the current allocation strategy is an optimization. Your change seems to decide that for certain devices we want to optimize for fragmentation rather than performance. If these are rarely used features specifically when looking at performance should we not continue to optimize for performance? Or at least make it user configurable? Thanks James ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability 2025-04-22 10:21 ` James Clark @ 2025-04-22 12:49 ` Ingo Molnar 2025-04-22 14:10 ` Leo Yan 0 siblings, 1 reply; 14+ messages in thread From: Ingo Molnar @ 2025-04-22 12:49 UTC (permalink / raw) To: James Clark Cc: Yabin Cui, coresight, linux-arm-kernel, linux-kernel, linux-perf-users, Suzuki K Poulose, Mike Leach, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan * James Clark <james.clark@linaro.org> wrote: > > > On 21/04/2025 10:58 pm, Yabin Cui wrote: > > For PMUs like ARM ETM/ETE, contiguous AUX buffers are unnecessary > > and increase memory fragmentation. > > > > This patch introduces PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES, allowing > > PMUs to request non-contiguous pages for their AUX buffers. > > > > Signed-off-by: Yabin Cui <yabinc@google.com> > > --- > > include/linux/perf_event.h | 1 + > > kernel/events/ring_buffer.c | 6 ++++++ > > 2 files changed, 7 insertions(+) > > > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > > index 0069ba6866a4..26ca35d6a9f2 100644 > > --- a/include/linux/perf_event.h > > +++ b/include/linux/perf_event.h > > @@ -301,6 +301,7 @@ struct perf_event_pmu_context; > > #define PERF_PMU_CAP_AUX_OUTPUT 0x0080 > > #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 > > #define PERF_PMU_CAP_AUX_PAUSE 0x0200 > > +#define PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES 0x0400 > > /** > > * pmu::scope > > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > > index 5130b119d0ae..87f42f4e8edc 100644 > > --- a/kernel/events/ring_buffer.c > > +++ b/kernel/events/ring_buffer.c > > @@ -710,6 +710,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, > > max_order = ilog2(nr_pages); > > watermark = 0; > > } > > + /* > > + * When the PMU doesn't prefer contiguous AUX buffer pages, favor > > + * low-order allocations to reduce memory fragmentation. > > + */ > > + if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES) > > + max_order = 0; > > /* > > * kcalloc_node() is unable to allocate buffer if the size is larger > > Hi Yabin, > > I was wondering if this is just the opposite of > PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default > for all devices to solve the issue you describe. Because we already > have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages. > Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order > allocations for AUX buffers optimistically") that explains that the > current allocation strategy is an optimization. > > Your change seems to decide that for certain devices we want to > optimize for fragmentation rather than performance. If these are > rarely used features specifically when looking at performance should > we not continue to optimize for performance? Or at least make it user > configurable? So there seems to be 3 categories: - 1) Must have physically contiguous AUX buffers, it's a hardware ABI. (PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.) - 2) Would be nice to have continguous AUX buffers, for a bit more performance. - 3) Doesn't really care. So we do have #1, and it appears Yabin's usecase is #3? I strongly suspect that #2 and #3 are mostly the same in practice, and that we don't really need a lot of differentiation and complexity here, just the AUX_NO_SG flag that must have a max-order allocation - all other cases should allocate the AUX buffer in a default-nice, MM-friendly way. Thanks, Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability 2025-04-22 12:49 ` Ingo Molnar @ 2025-04-22 14:10 ` Leo Yan 2025-04-23 19:52 ` Yabin Cui 0 siblings, 1 reply; 14+ messages in thread From: Leo Yan @ 2025-04-22 14:10 UTC (permalink / raw) To: Ingo Molnar Cc: James Clark, Yabin Cui, coresight, linux-arm-kernel, linux-kernel, linux-perf-users, Mike Leach, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan On Tue, Apr 22, 2025 at 02:49:54PM +0200, Ingo Molnar wrote: [...] > > Hi Yabin, > > > > I was wondering if this is just the opposite of > > PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default > > for all devices to solve the issue you describe. Because we already > > have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages. > > Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order > > allocations for AUX buffers optimistically") that explains that the > > current allocation strategy is an optimization. > > > > Your change seems to decide that for certain devices we want to > > optimize for fragmentation rather than performance. If these are > > rarely used features specifically when looking at performance should > > we not continue to optimize for performance? Or at least make it user > > configurable? > > So there seems to be 3 categories: > > - 1) Must have physically contiguous AUX buffers, it's a hardware ABI. > (PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.) > > - 2) Would be nice to have continguous AUX buffers, for a bit more > performance. > > - 3) Doesn't really care. > > So we do have #1, and it appears Yabin's usecase is #3? In Yabin's case, the AUX buffer work as a bounce buffer. The hardware trace data is copied by a driver from low level's contiguous buffer to the AUX buffer. In this case we cannot benefit much from continguous AUX buffers. Thanks, Leo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability 2025-04-22 14:10 ` Leo Yan @ 2025-04-23 19:52 ` Yabin Cui 2025-04-28 8:56 ` James Clark 0 siblings, 1 reply; 14+ messages in thread From: Yabin Cui @ 2025-04-23 19:52 UTC (permalink / raw) To: Leo Yan Cc: Ingo Molnar, James Clark, coresight, linux-arm-kernel, linux-kernel, linux-perf-users, Mike Leach, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan On Tue, Apr 22, 2025 at 7:10 AM Leo Yan <leo.yan@arm.com> wrote: > > On Tue, Apr 22, 2025 at 02:49:54PM +0200, Ingo Molnar wrote: > > [...] > > > > Hi Yabin, > > > > > > I was wondering if this is just the opposite of > > > PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default > > > for all devices to solve the issue you describe. Because we already > > > have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages. > > > Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order > > > allocations for AUX buffers optimistically") that explains that the > > > current allocation strategy is an optimization. > > > > > > Your change seems to decide that for certain devices we want to > > > optimize for fragmentation rather than performance. If these are > > > rarely used features specifically when looking at performance should > > > we not continue to optimize for performance? Or at least make it user > > > configurable? > > > > So there seems to be 3 categories: > > > > - 1) Must have physically contiguous AUX buffers, it's a hardware ABI. > > (PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.) > > > > - 2) Would be nice to have continguous AUX buffers, for a bit more > > performance. > > > > - 3) Doesn't really care. > > > > So we do have #1, and it appears Yabin's usecase is #3? Yes, in my usecase, I care much more about MM-friendly than a little potential performance when using PMU. It's not a rarely used feature. On Android, we collect ETM data periodically on internal user devices for AutoFDO optimization (for both userspace libraries and the kernel). Allocating a large chunk of contiguous AUX pages (4M for each CPU) periodically is almost unbearable. The kernel may need to kill many processes to fulfill the request. It affects user experience even after using PMU. I am totally fine to reuse PERF_PMU_CAP_AUX_NO_SG. If PMUs don't want to sacrifice performance for MM-friendly, why support scatter gather mode? If there are strong performance reasons to allocate contiguous AUX pages in scatter gather mode, I hope max_order is configurable in userspace. Currently, max_order is affected by aux_watermark. But aux_watermark also affects how frequently the PMU overflows AUX buffer and notifies userspace. It's not ideal to set aux_watermark to 1 page size. So if we want to make max_order user configurable, maybe we can add a one bit field in perf_event_attr? > > In Yabin's case, the AUX buffer work as a bounce buffer. The hardware > trace data is copied by a driver from low level's contiguous buffer to > the AUX buffer. > > In this case we cannot benefit much from continguous AUX buffers. > > Thanks, > Leo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability 2025-04-23 19:52 ` Yabin Cui @ 2025-04-28 8:56 ` James Clark 2025-04-29 17:02 ` Yabin Cui 0 siblings, 1 reply; 14+ messages in thread From: James Clark @ 2025-04-28 8:56 UTC (permalink / raw) To: Yabin Cui, Leo Yan, Ingo Molnar Cc: Ingo Molnar, coresight, linux-arm-kernel, linux-kernel, linux-perf-users, Mike Leach, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan On 23/04/2025 8:52 pm, Yabin Cui wrote: > On Tue, Apr 22, 2025 at 7:10 AM Leo Yan <leo.yan@arm.com> wrote: >> >> On Tue, Apr 22, 2025 at 02:49:54PM +0200, Ingo Molnar wrote: >> >> [...] >> >>>> Hi Yabin, >>>> >>>> I was wondering if this is just the opposite of >>>> PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default >>>> for all devices to solve the issue you describe. Because we already >>>> have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages. >>>> Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order >>>> allocations for AUX buffers optimistically") that explains that the >>>> current allocation strategy is an optimization. >>>> >>>> Your change seems to decide that for certain devices we want to >>>> optimize for fragmentation rather than performance. If these are >>>> rarely used features specifically when looking at performance should >>>> we not continue to optimize for performance? Or at least make it user >>>> configurable? >>> >>> So there seems to be 3 categories: >>> >>> - 1) Must have physically contiguous AUX buffers, it's a hardware ABI. >>> (PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.) >>> >>> - 2) Would be nice to have continguous AUX buffers, for a bit more >>> performance. >>> >>> - 3) Doesn't really care. >>> >>> So we do have #1, and it appears Yabin's usecase is #3? > > Yes, in my usecase, I care much more about MM-friendly than a little potential > performance when using PMU. It's not a rarely used feature. On Android, we > collect ETM data periodically on internal user devices for AutoFDO optimization > (for both userspace libraries and the kernel). Allocating a large > chunk of contiguous > AUX pages (4M for each CPU) periodically is almost unbearable. The kernel may > need to kill many processes to fulfill the request. It affects user > experience even > after using PMU. > > I am totally fine to reuse PERF_PMU_CAP_AUX_NO_SG. If PMUs don't want to > sacrifice performance for MM-friendly, why support scatter gather mode? If there > are strong performance reasons to allocate contiguous AUX pages in > scatter gather > mode, I hope max_order is configurable in userspace. > > Currently, max_order is affected by aux_watermark. But aux_watermark > also affects > how frequently the PMU overflows AUX buffer and notifies userspace. > It's not ideal > to set aux_watermark to 1 page size. So if we want to make max_order user > configurable, maybe we can add a one bit field in perf_event_attr? > >> >> In Yabin's case, the AUX buffer work as a bounce buffer. The hardware >> trace data is copied by a driver from low level's contiguous buffer to >> the AUX buffer. >> >> In this case we cannot benefit much from continguous AUX buffers. >> >> Thanks, >> Leo Hi Yabin, So after doing some testing it looks like there is 0 difference in overhead for max_order=0 vs ensuring the buffer is one contiguous allocation for Arm SPE, and TRBE would be exactly the same. This makes sense because we're vmapping pages individually anyway regardless of the base allocation. Seems like the performance optimization of the optimistically large mappings is only for devices that require extra buffer management stuff other than normal virtual memory. Can we add a new capability PERF_PMU_CAP_AUX_PREFER_LARGE and apply it to Intel PT and BTS? Then the old (before the optimistic large allocs change) max_order=0 behavior becomes the default again, and PREFER_LARGE is just for those two devices. Other and new devices would get the more memory friendly allocations by default, as it's unlikely they'll benefit from anything different. Thanks James ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability 2025-04-28 8:56 ` James Clark @ 2025-04-29 17:02 ` Yabin Cui 2025-04-29 21:35 ` Yabin Cui 0 siblings, 1 reply; 14+ messages in thread From: Yabin Cui @ 2025-04-29 17:02 UTC (permalink / raw) To: James Clark Cc: Leo Yan, Ingo Molnar, Ingo Molnar, coresight, linux-arm-kernel, linux-kernel, linux-perf-users, Mike Leach, Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan On Mon, Apr 28, 2025 at 1:56 AM James Clark <james.clark@linaro.org> wrote: > > > > On 23/04/2025 8:52 pm, Yabin Cui wrote: > > On Tue, Apr 22, 2025 at 7:10 AM Leo Yan <leo.yan@arm.com> wrote: > >> > >> On Tue, Apr 22, 2025 at 02:49:54PM +0200, Ingo Molnar wrote: > >> > >> [...] > >> > >>>> Hi Yabin, > >>>> > >>>> I was wondering if this is just the opposite of > >>>> PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default > >>>> for all devices to solve the issue you describe. Because we already > >>>> have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages. > >>>> Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order > >>>> allocations for AUX buffers optimistically") that explains that the > >>>> current allocation strategy is an optimization. > >>>> > >>>> Your change seems to decide that for certain devices we want to > >>>> optimize for fragmentation rather than performance. If these are > >>>> rarely used features specifically when looking at performance should > >>>> we not continue to optimize for performance? Or at least make it user > >>>> configurable? > >>> > >>> So there seems to be 3 categories: > >>> > >>> - 1) Must have physically contiguous AUX buffers, it's a hardware ABI. > >>> (PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.) > >>> > >>> - 2) Would be nice to have continguous AUX buffers, for a bit more > >>> performance. > >>> > >>> - 3) Doesn't really care. > >>> > >>> So we do have #1, and it appears Yabin's usecase is #3? > > > > Yes, in my usecase, I care much more about MM-friendly than a little potential > > performance when using PMU. It's not a rarely used feature. On Android, we > > collect ETM data periodically on internal user devices for AutoFDO optimization > > (for both userspace libraries and the kernel). Allocating a large > > chunk of contiguous > > AUX pages (4M for each CPU) periodically is almost unbearable. The kernel may > > need to kill many processes to fulfill the request. It affects user > > experience even > > after using PMU. > > > > I am totally fine to reuse PERF_PMU_CAP_AUX_NO_SG. If PMUs don't want to > > sacrifice performance for MM-friendly, why support scatter gather mode? If there > > are strong performance reasons to allocate contiguous AUX pages in > > scatter gather > > mode, I hope max_order is configurable in userspace. > > > > Currently, max_order is affected by aux_watermark. But aux_watermark > > also affects > > how frequently the PMU overflows AUX buffer and notifies userspace. > > It's not ideal > > to set aux_watermark to 1 page size. So if we want to make max_order user > > configurable, maybe we can add a one bit field in perf_event_attr? > > > >> > >> In Yabin's case, the AUX buffer work as a bounce buffer. The hardware > >> trace data is copied by a driver from low level's contiguous buffer to > >> the AUX buffer. > >> > >> In this case we cannot benefit much from continguous AUX buffers. > >> > >> Thanks, > >> Leo > > Hi Yabin, > > So after doing some testing it looks like there is 0 difference in > overhead for max_order=0 vs ensuring the buffer is one contiguous > allocation for Arm SPE, and TRBE would be exactly the same. This makes > sense because we're vmapping pages individually anyway regardless of the > base allocation. > > Seems like the performance optimization of the optimistically large > mappings is only for devices that require extra buffer management stuff > other than normal virtual memory. Can we add a new capability > PERF_PMU_CAP_AUX_PREFER_LARGE and apply it to Intel PT and BTS? Then the > old (before the optimistic large allocs change) max_order=0 behavior > becomes the default again, and PREFER_LARGE is just for those two > devices. Other and new devices would get the more memory friendly > allocations by default, as it's unlikely they'll benefit from anything > different. > Good suggestion! I will upload a v2 patch for that. > > Thanks > James > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability 2025-04-29 17:02 ` Yabin Cui @ 2025-04-29 21:35 ` Yabin Cui 0 siblings, 0 replies; 14+ messages in thread From: Yabin Cui @ 2025-04-29 21:35 UTC (permalink / raw) To: James Clark Cc: Leo Yan, Ingo Molnar, Ingo Molnar, coresight, linux-arm-kernel, linux-kernel, linux-perf-users, Mike Leach, Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan On Tue, Apr 29, 2025 at 10:02 AM Yabin Cui <yabinc@google.com> wrote: > > On Mon, Apr 28, 2025 at 1:56 AM James Clark <james.clark@linaro.org> wrote: > > > > > > > > On 23/04/2025 8:52 pm, Yabin Cui wrote: > > > On Tue, Apr 22, 2025 at 7:10 AM Leo Yan <leo.yan@arm.com> wrote: > > >> > > >> On Tue, Apr 22, 2025 at 02:49:54PM +0200, Ingo Molnar wrote: > > >> > > >> [...] > > >> > > >>>> Hi Yabin, > > >>>> > > >>>> I was wondering if this is just the opposite of > > >>>> PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default > > >>>> for all devices to solve the issue you describe. Because we already > > >>>> have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages. > > >>>> Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order > > >>>> allocations for AUX buffers optimistically") that explains that the > > >>>> current allocation strategy is an optimization. > > >>>> > > >>>> Your change seems to decide that for certain devices we want to > > >>>> optimize for fragmentation rather than performance. If these are > > >>>> rarely used features specifically when looking at performance should > > >>>> we not continue to optimize for performance? Or at least make it user > > >>>> configurable? > > >>> > > >>> So there seems to be 3 categories: > > >>> > > >>> - 1) Must have physically contiguous AUX buffers, it's a hardware ABI. > > >>> (PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.) > > >>> > > >>> - 2) Would be nice to have continguous AUX buffers, for a bit more > > >>> performance. > > >>> > > >>> - 3) Doesn't really care. > > >>> > > >>> So we do have #1, and it appears Yabin's usecase is #3? > > > > > > Yes, in my usecase, I care much more about MM-friendly than a little potential > > > performance when using PMU. It's not a rarely used feature. On Android, we > > > collect ETM data periodically on internal user devices for AutoFDO optimization > > > (for both userspace libraries and the kernel). Allocating a large > > > chunk of contiguous > > > AUX pages (4M for each CPU) periodically is almost unbearable. The kernel may > > > need to kill many processes to fulfill the request. It affects user > > > experience even > > > after using PMU. > > > > > > I am totally fine to reuse PERF_PMU_CAP_AUX_NO_SG. If PMUs don't want to > > > sacrifice performance for MM-friendly, why support scatter gather mode? If there > > > are strong performance reasons to allocate contiguous AUX pages in > > > scatter gather > > > mode, I hope max_order is configurable in userspace. > > > > > > Currently, max_order is affected by aux_watermark. But aux_watermark > > > also affects > > > how frequently the PMU overflows AUX buffer and notifies userspace. > > > It's not ideal > > > to set aux_watermark to 1 page size. So if we want to make max_order user > > > configurable, maybe we can add a one bit field in perf_event_attr? > > > > > >> > > >> In Yabin's case, the AUX buffer work as a bounce buffer. The hardware > > >> trace data is copied by a driver from low level's contiguous buffer to > > >> the AUX buffer. > > >> > > >> In this case we cannot benefit much from continguous AUX buffers. > > >> > > >> Thanks, > > >> Leo > > > > Hi Yabin, > > > > So after doing some testing it looks like there is 0 difference in > > overhead for max_order=0 vs ensuring the buffer is one contiguous > > allocation for Arm SPE, and TRBE would be exactly the same. This makes > > sense because we're vmapping pages individually anyway regardless of the > > base allocation. > > > > Seems like the performance optimization of the optimistically large > > mappings is only for devices that require extra buffer management stuff > > other than normal virtual memory. Can we add a new capability > > PERF_PMU_CAP_AUX_PREFER_LARGE and apply it to Intel PT and BTS? Then the > > old (before the optimistic large allocs change) max_order=0 behavior > > becomes the default again, and PREFER_LARGE is just for those two > > devices. Other and new devices would get the more memory friendly > > allocations by default, as it's unlikely they'll benefit from anything > > different. > > > Good suggestion! I will upload a v2 patch for that. Hi everyone, I have sent the v2 patch for review, with the title "[PATCH v2] perf: Allocate non-contiguous AUX pages by default". Please help review it. Thanks! > > > > Thanks > > James > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU 2025-04-21 21:58 [PATCH 0/2] perf,coresight: Reduce fragmentation with non-contiguous AUX pages for cs_etm Yabin Cui 2025-04-21 21:58 ` [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability Yabin Cui @ 2025-04-21 21:58 ` Yabin Cui 2025-04-22 14:21 ` Leo Yan 1 sibling, 1 reply; 14+ messages in thread From: Yabin Cui @ 2025-04-21 21:58 UTC (permalink / raw) To: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan Cc: coresight, linux-arm-kernel, linux-kernel, linux-perf-users, Yabin Cui The cs_etm PMU, regardless of the underlying trace sink (ETF, ETR or TRBE), doesn't require contiguous pages for its AUX buffer. This patch adds the PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES capability to the cs_etm PMU. This allows the kernel to allocate non-contiguous pages for the AUX buffer, reducing memory fragmentation when using cs_etm. Signed-off-by: Yabin Cui <yabinc@google.com> --- drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index f4cccd68e625..c98646eca7f8 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -899,7 +899,8 @@ int __init etm_perf_init(void) int ret; etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE | - PERF_PMU_CAP_ITRACE); + PERF_PMU_CAP_ITRACE | + PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES); etm_pmu.attr_groups = etm_pmu_attr_groups; etm_pmu.task_ctx_nr = perf_sw_context; -- 2.49.0.805.g082f7c87e0-goog ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU 2025-04-21 21:58 ` [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU Yabin Cui @ 2025-04-22 14:21 ` Leo Yan 2025-04-23 20:01 ` Yabin Cui 0 siblings, 1 reply; 14+ messages in thread From: Leo Yan @ 2025-04-22 14:21 UTC (permalink / raw) To: Yabin Cui Cc: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan, coresight, linux-arm-kernel, linux-kernel, linux-perf-users On Mon, Apr 21, 2025 at 02:58:18PM -0700, Yabin Cui wrote: > The cs_etm PMU, regardless of the underlying trace sink (ETF, ETR or > TRBE), doesn't require contiguous pages for its AUX buffer. Though contiguous pages are not mandatory for TRBE, I would set the PERF_PMU_CAP_AUX_NO_SG flag for it. This can potentially benefit performance. For non per CPU sinks, it is fine to allocate non-contiguous pages. Thanks, Leo > This patch adds the PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES capability > to the cs_etm PMU. This allows the kernel to allocate non-contiguous > pages for the AUX buffer, reducing memory fragmentation when using > cs_etm. > > Signed-off-by: Yabin Cui <yabinc@google.com> > --- > drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c > index f4cccd68e625..c98646eca7f8 100644 > --- a/drivers/hwtracing/coresight/coresight-etm-perf.c > +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c > @@ -899,7 +899,8 @@ int __init etm_perf_init(void) > int ret; > > etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE | > - PERF_PMU_CAP_ITRACE); > + PERF_PMU_CAP_ITRACE | > + PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES); > > etm_pmu.attr_groups = etm_pmu_attr_groups; > etm_pmu.task_ctx_nr = perf_sw_context; > -- > 2.49.0.805.g082f7c87e0-goog > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU 2025-04-22 14:21 ` Leo Yan @ 2025-04-23 20:01 ` Yabin Cui 2025-04-24 11:29 ` Anshuman Khandual 0 siblings, 1 reply; 14+ messages in thread From: Yabin Cui @ 2025-04-23 20:01 UTC (permalink / raw) To: Leo Yan Cc: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan, coresight, linux-arm-kernel, linux-kernel, linux-perf-users On Tue, Apr 22, 2025 at 7:21 AM Leo Yan <leo.yan@arm.com> wrote: > > On Mon, Apr 21, 2025 at 02:58:18PM -0700, Yabin Cui wrote: > > The cs_etm PMU, regardless of the underlying trace sink (ETF, ETR or > > TRBE), doesn't require contiguous pages for its AUX buffer. > > Though contiguous pages are not mandatory for TRBE, I would set the > PERF_PMU_CAP_AUX_NO_SG flag for it. This can potentially benefit > performance. As explained in the patch 1/2, my use case periodically collects ETM data from the field (using both TRBE and ETR), and needs to reduce memory fragmentation. If the performance impact is big, we can make it user configurable. Otherwise, shall we default it to non-contiguous pages? > > For non per CPU sinks, it is fine to allocate non-contiguous pages. > > Thanks, > Leo > > > This patch adds the PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES capability > > to the cs_etm PMU. This allows the kernel to allocate non-contiguous > > pages for the AUX buffer, reducing memory fragmentation when using > > cs_etm. > > > > Signed-off-by: Yabin Cui <yabinc@google.com> > > --- > > drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c > > index f4cccd68e625..c98646eca7f8 100644 > > --- a/drivers/hwtracing/coresight/coresight-etm-perf.c > > +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c > > @@ -899,7 +899,8 @@ int __init etm_perf_init(void) > > int ret; > > > > etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE | > > - PERF_PMU_CAP_ITRACE); > > + PERF_PMU_CAP_ITRACE | > > + PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES); > > > > etm_pmu.attr_groups = etm_pmu_attr_groups; > > etm_pmu.task_ctx_nr = perf_sw_context; > > -- > > 2.49.0.805.g082f7c87e0-goog > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU 2025-04-23 20:01 ` Yabin Cui @ 2025-04-24 11:29 ` Anshuman Khandual 2025-04-24 18:32 ` Yabin Cui 0 siblings, 1 reply; 14+ messages in thread From: Anshuman Khandual @ 2025-04-24 11:29 UTC (permalink / raw) To: Yabin Cui, Leo Yan Cc: Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan, coresight, linux-arm-kernel, linux-kernel, linux-perf-users On 4/24/25 01:31, Yabin Cui wrote: > On Tue, Apr 22, 2025 at 7:21 AM Leo Yan <leo.yan@arm.com> wrote: >> >> On Mon, Apr 21, 2025 at 02:58:18PM -0700, Yabin Cui wrote: >>> The cs_etm PMU, regardless of the underlying trace sink (ETF, ETR or >>> TRBE), doesn't require contiguous pages for its AUX buffer. >> >> Though contiguous pages are not mandatory for TRBE, I would set the >> PERF_PMU_CAP_AUX_NO_SG flag for it. This can potentially benefit >> performance. > > As explained in the patch 1/2, my use case periodically collects ETM data > from the field (using both TRBE and ETR), and needs to reduce memory > fragmentation. If the performance impact is big, we can make it user > configurable. Otherwise, shall we default it to non-contiguous pages? But is not that already happening ? cs_etm does not set the PMU cap PERF_PMU_CAP_AUX_NO_SG that means it can allocate non-contig memory chunk. Where am I missing ? > >> >> For non per CPU sinks, it is fine to allocate non-contiguous pages. >> >> Thanks, >> Leo >> >>> This patch adds the PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES capability >>> to the cs_etm PMU. This allows the kernel to allocate non-contiguous >>> pages for the AUX buffer, reducing memory fragmentation when using >>> cs_etm. >>> >>> Signed-off-by: Yabin Cui <yabinc@google.com> >>> --- >>> drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c >>> index f4cccd68e625..c98646eca7f8 100644 >>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c >>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c >>> @@ -899,7 +899,8 @@ int __init etm_perf_init(void) >>> int ret; >>> >>> etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE | >>> - PERF_PMU_CAP_ITRACE); >>> + PERF_PMU_CAP_ITRACE | >>> + PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES); >>> >>> etm_pmu.attr_groups = etm_pmu_attr_groups; >>> etm_pmu.task_ctx_nr = perf_sw_context; >>> -- >>> 2.49.0.805.g082f7c87e0-goog >>> >>> > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU 2025-04-24 11:29 ` Anshuman Khandual @ 2025-04-24 18:32 ` Yabin Cui 0 siblings, 0 replies; 14+ messages in thread From: Yabin Cui @ 2025-04-24 18:32 UTC (permalink / raw) To: Anshuman Khandual Cc: Leo Yan, Suzuki K Poulose, Mike Leach, James Clark, Alexander Shishkin, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa, Ian Rogers, Adrian Hunter, Liang Kan, coresight, linux-arm-kernel, linux-kernel, linux-perf-users On Thu, Apr 24, 2025 at 4:29 AM Anshuman Khandual <anshuman.khandual@arm.com> wrote: > > > > On 4/24/25 01:31, Yabin Cui wrote: > > On Tue, Apr 22, 2025 at 7:21 AM Leo Yan <leo.yan@arm.com> wrote: > >> > >> On Mon, Apr 21, 2025 at 02:58:18PM -0700, Yabin Cui wrote: > >>> The cs_etm PMU, regardless of the underlying trace sink (ETF, ETR or > >>> TRBE), doesn't require contiguous pages for its AUX buffer. > >> > >> Though contiguous pages are not mandatory for TRBE, I would set the > >> PERF_PMU_CAP_AUX_NO_SG flag for it. This can potentially benefit > >> performance. > > > > As explained in the patch 1/2, my use case periodically collects ETM data > > from the field (using both TRBE and ETR), and needs to reduce memory > > fragmentation. If the performance impact is big, we can make it user > > configurable. Otherwise, shall we default it to non-contiguous pages? > > But is not that already happening ? cs_etm does not set the PMU cap > PERF_PMU_CAP_AUX_NO_SG that means it can allocate non-contig memory > chunk. Where am I missing ? Although cs_etm doesn't set AUX_NO_SG flag, the perf component still prefers to allocate contiguous AUX pages for it. The new flag is to ask perf component to not allocate contiguous AUX pages. > > > > >> > >> For non per CPU sinks, it is fine to allocate non-contiguous pages. > >> > >> Thanks, > >> Leo > >> > >>> This patch adds the PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES capability > >>> to the cs_etm PMU. This allows the kernel to allocate non-contiguous > >>> pages for the AUX buffer, reducing memory fragmentation when using > >>> cs_etm. > >>> > >>> Signed-off-by: Yabin Cui <yabinc@google.com> > >>> --- > >>> drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++- > >>> 1 file changed, 2 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c > >>> index f4cccd68e625..c98646eca7f8 100644 > >>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c > >>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c > >>> @@ -899,7 +899,8 @@ int __init etm_perf_init(void) > >>> int ret; > >>> > >>> etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE | > >>> - PERF_PMU_CAP_ITRACE); > >>> + PERF_PMU_CAP_ITRACE | > >>> + PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES); > >>> > >>> etm_pmu.attr_groups = etm_pmu_attr_groups; > >>> etm_pmu.task_ctx_nr = perf_sw_context; > >>> -- > >>> 2.49.0.805.g082f7c87e0-goog > >>> > >>> > > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-04-29 21:35 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-21 21:58 [PATCH 0/2] perf,coresight: Reduce fragmentation with non-contiguous AUX pages for cs_etm Yabin Cui 2025-04-21 21:58 ` [PATCH 1/2] perf: Allow non-contiguous AUX buffer pages via PMU capability Yabin Cui 2025-04-22 10:21 ` James Clark 2025-04-22 12:49 ` Ingo Molnar 2025-04-22 14:10 ` Leo Yan 2025-04-23 19:52 ` Yabin Cui 2025-04-28 8:56 ` James Clark 2025-04-29 17:02 ` Yabin Cui 2025-04-29 21:35 ` Yabin Cui 2025-04-21 21:58 ` [PATCH 2/2] coresight: etm-perf: Add AUX_NON_CONTIGUOUS_PAGES to cs_etm PMU Yabin Cui 2025-04-22 14:21 ` Leo Yan 2025-04-23 20:01 ` Yabin Cui 2025-04-24 11:29 ` Anshuman Khandual 2025-04-24 18:32 ` Yabin Cui
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).