All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: atomlin@redhat.com, frederic@kernel.org
Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org,
	peterz@infradead.org, pauld@redhat.com, neelx@redhat.com,
	oleksandr@natalenko.name, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: [PATCH v7 0/3] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too
Date: Wed, 17 Aug 2022 16:13:46 -0300	[thread overview]
Message-ID: <20220817191346.287594886@redhat.com> (raw)

This patchset contains enhancements on top of Aaron's -v6 of the series
(see the changelog below).

It fixes the following problems two problems:

1) A customer provided some evidence which indicates that the idle tick was
stopped; albeit, CPU-specific vmstat counters still remained populated.
Thus one can only assume quiet_vmstat() was not invoked on return to the
idle loop.

If I understand correctly, I suspect this divergence might erroneously
prevent a reclaim attempt by kswapd. If the number of zone specific free
pages are below their per-cpu drift value then
zone_page_state_snapshot() is used to compute a more accurate view of
the aforementioned statistic.  Thus any task blocked on the NUMA node
specific pfmemalloc_wait queue will be unable to make significant
progress via direct reclaim unless it is killed after being woken up by
kswapd (see throttle_direct_reclaim()).

2) With a SCHED_FIFO task that busy loops on a given CPU, 
and kworker for that CPU at SCHED_OTHER priority, queuing
work to sync per-vmstats will either cause that work
to never execute, or stalld boosts kworker priority which 
causes a latency violation.


Follows the v6 cover letter, with updated changelog. The numbers, for
the test program attached at the end of this cover letter, executed
inside a KVM VM, are:

                                Vanilla                 Patch

cycles per idle loop            151858                  153258  (+1.0%)

cycles per syscall              8461                    8690    (+2.6%)

--------


I have incorporated an idea from Marcelo's patch [1] where a CPU-specific
variable is used to indicate if a vmstat differential/or imbalance is
present for a given CPU. So, at the appropriate time, vmstat processing can
be initiated. The hope is that this particular approach is "cheaper" when
compared to need_update() - used currently; in the context of nohz_full and
the scheduling-clock tick being stopped, we would now with this patch,
check if a CPU-specific vmstat imbalance is present before exiting
user-mode (see tick_nohz_user_enter_prepare()).

This trivial test program [2] was used to determine the somewhat impact
under vanilla and with the proposed changes; mlock(2) and munlock(2) was
used solely to modify vmstat item 'NR_MLOCK'. The following is an average
count of CPU-cycles across the aforementioned system calls and the idle
loop, respectively. I believe these results are negligible:

	  Modified		   |  		Vanilla
                                   |
  cycles per syscall: 7399         | 	cycles per syscall: 4150
  cycles per idle loop: 141048     |	cycles per idle loop: 144730
                                   |


Any feedback would be appreciated. Thanks.

Changes since v6 [6]:
 - sync vmstats independently of whether vmstat_update work is queued or not
 - clean vmstat_dirty before differential sync loop
 - cancel pending work if tick stopped
 - do not queue work to remote CPU if tick stopped

Changes since v5 [3]:

 - Introduced __tick_nohz_user_enter_prepare()
 - Switched to EXPORT_SYMBOL_GPL()

Changes since v4 [4]:

 - Moved vmstat_dirty specific changes into a separate patch
   (Marcelo Tosatti)

Changes since v3 [5]:

 - Used EXPORT_SYMBOL() on tick_nohz_user_enter_prepare()
 - Replaced need_update()
 - Introduced CPU-specific variable namely vmstat_dirty
   and mark_vmstat_dirty()

[1]: https://lore.kernel.org/lkml/20220204173554.763888172@fedora.localdomain/
[2]: https://pastebin.com/8AtzSAuK
[3]: https://lore.kernel.org/lkml/20220801234258.134609-1-atomlin@redhat.com/
[4]: https://lore.kernel.org/lkml/20220621172207.1501641-1-atomlin@redhat.com/
[5]: https://lore.kernel.org/lkml/20220422193647.3808657-1-atomlin@redhat.com/
[6]: https://lore.kernel.org/linux-mm/20220808194820.676246-1-atomlin@redhat.com/

 include/linux/tick.h     |    5 +++--
 kernel/time/tick-sched.c |   19 ++++++++++++++++++-
 mm/vmstat.c              |   74 ++++++++++++++++++++++++++++++++++++--------------------------------------
 3 files changed, 57 insertions(+), 41 deletions(-)

--- test-vmstat-overhead.c ---

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h>

typedef unsigned long long cycles_t;
typedef unsigned long long usecs_t;
typedef unsigned long long u64;

#ifdef __x86_64__
#define DECLARE_ARGS(val, low, high)    unsigned long low, high
#define EAX_EDX_VAL(val, low, high)     ((low) | ((u64)(high) << 32))
#define EAX_EDX_ARGS(val, low, high)    "a" (low), "d" (high)
#define EAX_EDX_RET(val, low, high)     "=a" (low), "=d" (high)
#else
#define DECLARE_ARGS(val, low, high)    unsigned long long val
#define EAX_EDX_VAL(val, low, high)     (val)
#define EAX_EDX_ARGS(val, low, high)    "A" (val)
#define EAX_EDX_RET(val, low, high)     "=A" (val)
#endif

static inline unsigned long long __rdtscll(void)
{
        DECLARE_ARGS(val, low, high);

        asm volatile("cpuid; rdtsc" : EAX_EDX_RET(val, low, high));

        return EAX_EDX_VAL(val, low, high);
}

#define rdtscll(val) do { (val) = __rdtscll(); } while (0)

#define NRSYSCALLS 30000
#define NRSLEEPS   100000

void main(int argc, char *argv[])
{
	unsigned long a, b, cycles;
	int i, syscall = 0;
	void *page = malloc(4096);

	if (mlock(page, 4096))
		perror("mlock");
	if (munlock(page, 4096))
		perror("munlock");

	if (argc != 2) {
		printf("usage: %s {idle,syscall}\n", argv[0]);
		exit(1);
	}

        rdtscll(a);

	if (strncmp("idle", argv[1], 4) == 0)
		syscall = 0;
	else if (strncmp("syscall", argv[1], 7) == 0)
		syscall = 1;
	else {
		printf("usage: %s {idle,syscall}\n", argv[0]);
		exit(1);
	}
	
	if (syscall == 1) {
        	for (i = 0; i < NRSYSCALLS; i++) {
			if (mlock(page, 4096))
				perror("mlock");
			if (munlock(page, 4096))
				perror("munlock");
		}
	} else {
        	for (i = 0; i < NRSLEEPS; i++)
		 	usleep(10);
	}

        rdtscll(b);

        cycles = b - a;

	if (syscall == 1)
        	printf("cycles per syscall: %d\n", (b-a)/(NRSYSCALLS*2));
	else
		printf("cycles per idle loop: %d\n", (b-a)/NRSLEEPS);
}




             reply	other threads:[~2022-08-17 19:17 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17 19:13 Marcelo Tosatti [this message]
2022-08-17 19:13 ` [PATCH v7 1/3] mm/vmstat: Use per cpu variable to track a vmstat discrepancy Marcelo Tosatti
2022-08-24 20:20   ` Andrew Morton
2022-08-26 13:29     ` Aaron Tomlin
2022-08-17 19:13 ` [PATCH v7 2/3] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too Marcelo Tosatti
2022-08-24 20:20   ` Andrew Morton
2022-09-09 12:12   ` Frederic Weisbecker
2022-09-09 19:35     ` Marcelo Tosatti
2022-09-12 14:38       ` Aaron Tomlin
2022-09-14 11:04         ` Frederic Weisbecker
2022-08-17 19:13 ` [PATCH v7 3/3] mm/vmstat: do not queue vmstat_update if tick is stopped Marcelo Tosatti
  -- strict thread matches above, loose matches on Subject: below --
2022-08-17 19:01 [PATCH v7 0/3] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too Marcelo Tosatti
2022-08-17 19:01 ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220817191346.287594886@redhat.com \
    --to=mtosatti@redhat.com \
    --cc=atomlin@redhat.com \
    --cc=cl@linux.com \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=neelx@redhat.com \
    --cc=oleksandr@natalenko.name \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.