On 1/11/19 5:29 PM, speck for Andi Kleen wrote: > +Some CPUs can leave read or written data in internal buffers, > +which then later might be sampled through side effects. > +For more details see CVE-2018-12126 CVE-2018-12130 CVE-2018-12127 > + > +This can be avoided by explicitely clearing the CPU state. s/explicitely/explicitly > + > +We trying to avoid leaking data between different processes, Suggest changing the above phrase to the below: CPU state clearing prevents leaking data between different processes, ... > +Basic requirements and assumptions > +---------------------------------- > + > +Kernel addresses and kernel temporary data are not sensitive. > + > +User data is sensitive, but only for other processes. > + > +Kernel data is sensitive when it is cryptographic keys. s/when it is/when it involves/ > + > +Guidance for driver/subsystem developers > +---------------------------------------- > + > +When you touch user supplied data of *other* processes in system call > +context add lazy_clear_cpu(). > + > +For the cases below we care only about data from other processes. > +Touching non cryptographic data from the current process is always allowed. > + > +Touching only pointers to user data is always allowed. > + > +When your interrupt does not touch user data directly consider marking Add a "," between "directly" and "consider" > +it with IRQF_NO_USER. > + > +When your tasklet does not touch user data directly consider marking Add a "," between "directly" and "consider" > +it with TASKLET_NO_USER using tasklet_init_flags/or > +DECLARE_TASKLET*_NOUSER. > + > +When your timer does not touch user data mark it with TIMER_NO_USER. Add a "," between "data" and "mark" > +If it is a hrtimer mark it with HRTIMER_MODE_NO_USER. Add a "," between "hrtimer" and "mark" > + > +When your irq poll handler does not touch user data, mark it > +with IRQ_POLL_F_NO_USER through irq_poll_init_flags. > + > +For networking code make sure to only touch user data through Add a "," between "code" and "make" > +skb_push/put/copy [add more], unless it is data from the current > +process. If that is not ensured add lazy_clear_cpu or Add a "," between "ensured" and "add" > +lazy_clear_cpu_interrupt. When the non skb data access is only in a > +hardware interrupt controlled by the driver, it can rely on not > +setting IRQF_NO_USER for that interrupt. > + > +Any cryptographic code touching key data should use memzero_explicit > +or kzfree. > + > +If your RCU callback touches user data add lazy_clear_cpu(). > + > +These steps are currently only needed for code that runs on MDS affected > +CPUs, which is currently only x86. But might be worth being prepared > +if other architectures become affected too. > + > +Implementation details/assumptions > +---------------------------------- > + > +If a system call touches data it is for its own process, so does not suggest rephrasing to If a system call touches data of its own process, cpu state does not > +need to be cleared, because it has already access to it. > + > +When context switching we clear data, unless the context switch > +is inside a process, or from/to idle. We also clear after any > +context switches from kernel threads. > + > +Idle does not have sensitive data, except for in interrupts, which > +are handled separately. > + > +Cryptographic keys inside the kernel should be protected. > +We assume they use kzfree() or memzero_explicit() to clear > +state, so these functions trigger a cpu clear. > + > +Hard interrupts, tasklets, timers which can run asynchronous are > +assumed to touch random user data, unless they have been audited, and > +marked with NO_USER flags. > + > +Most interrupt handlers for modern devices should not touch > +user data because they rely on DMA and only manipulate > +pointers. This needs auditing to confirm though. > + > +For softirqs we assume that if they touch user data they use Add "," between "data" and "they" ... > +Technically we would only need to do this if the BPF program > +contains conditional branches and loads dominated by them, but > +let's assume that near all do. s/near/nealy/ > + > +This could be further optimized by allowing callers that do > +a lot of individual BPF runs and are sure they don't touch > +other user's data inbetween to do the clear only once > +at the beginning. Suggest breaking the above sentence. It is quite difficult to read. > We can add such optimizations later based on > +profile data. > + > +Virtualization > +-------------- > + > +When entering a guest in KVM we clear to avoid any leakage to a guest. ... we clear CPU state to avoid .... > +Normally this is done implicitely as part of the L1TF mitigation. s/implicitely/implicitly/ > +It relies on this being enabled. It also uses the "fast exit" > +optimization that only clears if an interrupt or context switch > +happened. >