From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262691AbTJTQBR (ORCPT ); Mon, 20 Oct 2003 12:01:17 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262694AbTJTQBR (ORCPT ); Mon, 20 Oct 2003 12:01:17 -0400 Received: from zok.SGI.COM ([204.94.215.101]:1199 "EHLO zok.sgi.com") by vger.kernel.org with ESMTP id S262691AbTJTQBL (ORCPT ); Mon, 20 Oct 2003 12:01:11 -0400 Message-ID: <3F940643.DFC17539@engr.sgi.com> Date: Mon, 20 Oct 2003 08:58:59 -0700 From: Rich Altmaier Organization: SGI X-Mailer: Mozilla 4.78 [en]C-CCK-MCD SGI (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org Subject: fielding PCI bus timeouts - was: prevent "dd if=/dev/mem" crash References: <200310171610.36569.bjorn.helgaas@hp.com> <20031017155028.2e98b307.akpm@osdl.org> <200310171725.10883.bjorn.helgaas@hp.com> <20031017165543.2f7e9d49.akpm@osdl.org> <16272.34681.443232.246020@napali.hpl.hp.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Just a note to mention experience with handling hardware failures. For the case of user mappings to IO buses there are important classes of aps, usually realtime or data acquisition, that benefit from nice error handling of bus timeouts at user level. These aps tend to be using old or prototype hardware, which can fail (cause a bus timeout) during "normal" operation. Or at least the user view is that the machine should not crash due to "one flaky board". Hence there is merit in being able to translate a PIO-read bus timeout to say a SIGBUS. More interesting is the case of PIO-write failure, as the writes can be asynchronous. Meaning by the time the hardware recognizes a failure, the CPU's store instruction has graduated and the CPU has moved on. Perhaps gone through a context switch or even exitted the user process. In this case something more than a SIGBUS is needed (IRIX has several options to deal with this). On the thread about dd if=/dev/mem, I don't know of any legitmate reason that user code needs to successfully recover from reading non-existant phys memory. I would suggest the princple that bad user code should not cause a crash, and bad user code that does a lot of reads would be thought harmless by many dangerous users. So some kind of error to the user process is probably reasonable, perhaps SIGBUS. Silently returning 0 doesn't sound right, as if there were any legitimate reason for this code in the first place, it probably relates to some search of the physical address space. Perhaps a diagnostic. FYI, Rich