Re: Software Prefetching using Machine learning

From: "Valdis Klētnieks" <valdis.kletnieks@vt.edu>
To: "Irfan Ullah" <irfan@dke.khu.ac.kr>
Cc: kernelnewbies@kernelnewbies.org
Subject: Re: Software Prefetching using Machine learning
Date: Wed, 09 Oct 2019 15:08:32 -0400	[thread overview]
Message-ID: <170918.1570648112@turing-police> (raw)
In-Reply-To: <CA+mB8OyQesAb6e6QQ6HFcYkx1Jm7UCrxwnUxyzQ=myhQZD3fWg@mail.gmail.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1: Type: text/plain; charset=utf-8, Size: 2404 bytes --]

On Wed, 09 Oct 2019 12:37:51 +0900, Irfan Ullah said:

> Thanks in advance. I am a PhD candidate, and currently I have started
> working on kernel development. My professor told me to implement this paper
> <https://arxiv.org/abs/1803.02329>. In this paper authors have used machine
> learning to predict the next missed addresses.

From the abstract:

On a suite of challenging benchmark datasets, we find that neural networks
consistently demonstrate superior performance in terms of precision and recall

Delving further into the paper, we discover that the researchers have learned that
if you run Spec CPU2006 enough times, a neural network can learn what memory access
patterns Spec CPU2006 exhibits.

But they don't demonstrate that the patterns learned transfer to any other programs.
And nobody sane runs the exact same program with the exact same inputs repeatedly
unless they're doing benchmarking.

Ah, academia - where novelty of an idea is sufficient to get published, and considerations
of whether it's a *useful* idea are totally disregarded.

> 1) How can I directly store the missed addresses, and instruction addresses
> from kernel handle_mm_fault() to a file?

Don't do that.  Pass the data to userspace via netlink or debugfs or shared
memory or other means, and have userspace handle it.

> 2) How can I use machine learning classifier in the kernel for predicting addresses?    

Well... in general, you won't be able to do much actually *useful*, because of
time scales.  If your predictor says "Oh, program XYZ will need page 1AB83D 20
milliseconds from now", but it takes 10 milliseconds to bring a page in, your
predictor has only 10 milliseconds to make the prediction in order to be
useful.

And in fact, you probably have even less, because your predictor has to be fast
enough and use little enough memory that it doesn't significantly affect CPU,
cache, or RAM usage.

> 3) Is there any way to do the machine learning in the user space in python, and
> then transfer  the classifier in bytes forms to the kernel space for address
> predictions ?

Sure, there's plenty of ways, from using shared memory to creating an ioctl().

But all of them are going to have the same "you need to do it in less time than
it takes for the program you're predicting to reach the point for the prediction".

Good luck, you will need it.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies