From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willy Tarreau Subject: Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2() Date: Fri, 20 Sep 2019 21:37:40 +0200 Message-ID: <20190920193740.GD1889@1wt.eu> References: <20190918211503.GA1808@darwi-home-pc> <20190918211713.GA2225@darwi-home-pc> <20190920134609.GA2113@pc> <20190920181216.GA1889@1wt.eu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Andy Lutomirski Cc: Linus Torvalds , "Ahmed S. Darwish" , Lennart Poettering , "Theodore Y. Ts'o" , "Eric W. Biederman" , "Alexander E. Patrakov" , Michael Kerrisk , Matthew Garrett , lkml , Ext4 Developers List , Linux API , linux-man List-Id: linux-api@vger.kernel.org On Fri, Sep 20, 2019 at 12:22:17PM -0700, Andy Lutomirski wrote: > Perhaps userland could register a helper that takes over and does > something better? If userland sees the failure it can do whatever the developer/distro packager thought suitable for the system facing this condition. > But I think the kernel really should do something > vaguely reasonable all by itself. Definitely, that's what Linus' proposal was doing. Sleeping for some time is what I call "vaguely reasonable". > If nothing else, we want the ext4 > patch that provoked this whole discussion to be applied, Oh absolutely! > which means > that we need to unbreak userspace somehow, and returning garbage it to > is not a good choice. It depends how it's used. I'd claim that we certainly use randoms for other things (such as ASLR/hashtables) *before* using them to generate long lived keys thus we can have a bit more time to get some more entropy before reaching the point of producing these keys. > Here are some possible approaches that come to mind: > > int count; > while (crng isn't inited) { > msleep(1); > } > > and modify add_timer_randomness() to at least credit a tiny bit to > crng_init_cnt. Without a timeout it's sure we'll still face some situations where it blocks forever, which is the current problem. > Or we do something like intentionally triggering readahead on some > offset on the root block device. You don't necessarily have such a device, especially when you're in an initramfs. It's precisely where userland can be smarter. When the caller is sfdisk for example, it does have more chances to try to perform I/O than when it's a tiny http server starting to present a configuration page. > We should definitely not trigger *blocking* IO. I think I agree. > Also, I wonder if the real problem preventing the RNG from staring up > is that the crng_init_cnt threshold is too high. We have a rather > baroque accounting system, and it seems like we can accumulate and > credit entropy for a very long time indeed without actually > considering ourselves done. I have no opinion on this, lacking the skills to evaluate the situation. What I can say for sure is that I've faced the non-booting issue quite a number of times on headless systems, and conversely in the 2.4 era, my front reverse-proxy by then had the same SSH key as 89 other machines on the net. So there's surely a sweet spot to find between those two extremes. I tend to think that waiting *a little bit* for the *first* random is acceptable, even 10-15s, by the time the user starts to think about pressing the reset button the system might finish to boot. Hashing some RAM locations and the RTC when present can also help a little bit. If at least my machine by then had combined the RTC's date and time with the hash, chances for a key collision would have gone down to one over many thousands. Willy