From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 17 Sep 2003 14:24:45 -0400 (EDT) From: Alan Stern Subject: How best to bypass the page cache from within a kernel module? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: linux-mm@kvack.org List-ID: I'm working on a kernel module driver for Linux 2.6. One of the things this driver needs to do is perform a VERIFY command; which means checking to make sure that certain disk sectors within a file actually can be read without encountering a bad sector or other hardware error. Now, I realize that there are already issues involved with convincing the disk drive to read from its media rather than from its cache. But apart from that, my problem is how to convince Linux to read from the drive rather than from the page cache. One suggestion was to use O_DIRECT when opening the file, because that does cause reads to go directly to the hardware. The problem with this is that since the direct-I/O routines send file data directly to user buffers, they must check that the buffer addresses are valid and belong to the user's address space. But my code runs in a kernel thread so it has no current->mm (and in any case I would prefer to use my kernel-space buffers rather than user-space memory). It might be possible to get hold of an mm_struct, but it's not necessarily easy as mm_alloc() isn't EXPORTed. Perhaps my thread could keep its original current->mm by incrementing current->mm->users before calling daemonize() and setting current->mm back to its original value afterward. Is that legal? Having done so, perhaps I could use some sort of mmap() call to allocate a user-space buffer that would be okay for direct-I/O. What's the best way to do that -- what function would I have to call? However, all that seems rather roundabout. An equally acceptable solution would be simply to invalidate all the entries in the page cache referring to my file, so that reads would be forced to go to the drive. Can anyone tell me how to do that? TIA, Alan Stern -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: How best to bypass the page cache from within a kernel module? From: Dave Hansen In-Reply-To: References: Content-Type: text/plain Message-Id: <1063827869.13097.124.camel@nighthawk> Mime-Version: 1.0 Date: 17 Sep 2003 12:44:29 -0700 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Alan Stern Cc: linux-mm List-ID: On Wed, 2003-09-17 at 11:24, Alan Stern wrote: > However, all that seems rather roundabout. An equally acceptable solution > would be simply to invalidate all the entries in the page cache referring > to my file, so that reads would be forced to go to the drive. Can anyone > tell me how to do that? Whatever you're trying to do, you probably shouldn't be doing it in the kernel to begin with. Do it from userspace, it will save you a lot of pain. -- Dave Hansen haveblue@us.ibm.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 17 Sep 2003 12:50:44 -0700 From: William Lee Irwin III Subject: Re: How best to bypass the page cache from within a kernel module? Message-ID: <20030917195044.GH14079@holomorphy.com> References: <1063827869.13097.124.camel@nighthawk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1063827869.13097.124.camel@nighthawk> Sender: owner-linux-mm@kvack.org Return-Path: To: Dave Hansen Cc: Alan Stern , linux-mm List-ID: On Wed, 2003-09-17 at 11:24, Alan Stern wrote: >> However, all that seems rather roundabout. An equally acceptable solution >> would be simply to invalidate all the entries in the page cache referring >> to my file, so that reads would be forced to go to the drive. Can anyone >> tell me how to do that? On Wed, Sep 17, 2003 at 12:44:29PM -0700, Dave Hansen wrote: > Whatever you're trying to do, you probably shouldn't be doing it in the > kernel to begin with. Do it from userspace, it will save you a lot of > pain. If you really want to bypass the pagecache etc. entirely, use raw io and don't even bother mounting the filesystem, and do it all from userspace. If you need it simultaneously mounted then you're in somewhat deeper trouble, though you can probably be rescued by nefarious means like that bit about shooting down the pagecache so you don't have some incoherent cache headache. -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 17 Sep 2003 16:33:08 -0400 (EDT) From: Alan Stern Subject: Re: How best to bypass the page cache from within a kernel module? In-Reply-To: <20030917195044.GH14079@holomorphy.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: William Lee Irwin III Cc: Dave Hansen , linux-mm List-ID: > On Wed, 2003-09-17 at 11:24, Alan Stern wrote: > > However, all that seems rather roundabout. An equally acceptable solution > > would be simply to invalidate all the entries in the page cache referring > > to my file, so that reads would be forced to go to the drive. Can anyone > > tell me how to do that? On Wed, Sep 17, 2003 at 12:44:29PM -0700, Dave Hansen wrote: > Whatever you're trying to do, you probably shouldn't be doing it in the > kernel to begin with. Do it from userspace, it will save you a lot of > pain. That's not particularly helpful, especially considering that the entire driver currently works just fine as a kernel module, with the exception of this one piece. (This one piece works too; it just doesn't do exactly what I want.) On Wed, 17 Sep 2003, William Lee Irwin III wrote: > If you really want to bypass the pagecache etc. entirely, use raw io and > don't even bother mounting the filesystem, and do it all from userspace. > If you need it simultaneously mounted then you're in somewhat deeper > trouble, though you can probably be rescued by nefarious means like that > bit about shooting down the pagecache so you don't have some incoherent > cache headache. I really want this to work through the filesystem. 99% of what my driver does involves normal reads and writes. And there are very good reasons for having it run as a kernel thread rather than a user process. It's just that this one piece, which is a very minor part of the driver, needs to avoid the page cache. So to reiterate my original questions: 1. What's the proper way for a kernel thread running in a module to get hold of an mm_struct or to keep the one it had before calling daemonize()? 2. What's the proper way for a kernel thread to allocate a region of userspace memory? 3. What's the proper way to invalidate all entries in the page cache that refer to a particular file? Alan Stern -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 17 Sep 2003 13:40:47 -0700 From: William Lee Irwin III Subject: Re: How best to bypass the page cache from within a kernel module? Message-ID: <20030917204047.GI14079@holomorphy.com> References: <20030917195044.GH14079@holomorphy.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org Return-Path: To: Alan Stern Cc: Dave Hansen , linux-mm List-ID: On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote: > I really want this to work through the filesystem. 99% of what my driver > does involves normal reads and writes. And there are very good reasons > for having it run as a kernel thread rather than a user process. It's > just that this one piece, which is a very minor part of the driver, needs > to avoid the page cache. > So to reiterate my original questions: Doesn't sound much most drivers after all that, but there's some weird stuff out there. On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote: > 1. What's the proper way for a kernel thread running in a module to get > hold of an mm_struct or to keep the one it had before calling daemonize()? Well, you can get one from the slab allocator, though I expect there will be a followup question here... On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote: > 2. What's the proper way for a kernel thread to allocate a region of > userspace memory? Hmm. Sounds like you want to grab a user address space and do userspace stuff inside there. Maybe avoid do_execve() etc. and call sys_*() for everything else outright? The question itself probably wants sys_mmap() or some such, or handle_mm_fault() depending on what you have in mind for allocation. On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote: > 3. What's the proper way to invalidate all entries in the page cache > that refer to a particular file? invalidate_inode_pages(). -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 17 Sep 2003 13:43:12 -0700 From: William Lee Irwin III Subject: Re: How best to bypass the page cache from within a kernel module? Message-ID: <20030917204312.GJ14079@holomorphy.com> References: <20030917195044.GH14079@holomorphy.com> <20030917204047.GI14079@holomorphy.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030917204047.GI14079@holomorphy.com> Sender: owner-linux-mm@kvack.org Return-Path: To: Alan Stern , Dave Hansen , linux-mm List-ID: On Wed, Sep 17, 2003 at 01:40:47PM -0700, William Lee Irwin III wrote: > or some such, or handle_mm_fault() depending on what you have in mind s/handle_mm_fault/make_pages_present/ -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 17 Sep 2003 17:30:50 -0400 (EDT) From: Alan Stern Subject: Re: How best to bypass the page cache from within a kernel module? In-Reply-To: <20030917204047.GI14079@holomorphy.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: William Lee Irwin III Cc: Dave Hansen , linux-mm List-ID: On Wed, 17 Sep 2003, William Lee Irwin III wrote: > On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote: > > 1. What's the proper way for a kernel thread running in a module to get > > hold of an mm_struct or to keep the one it had before calling daemonize()? > > Well, you can get one from the slab allocator, though I expect there will > be a followup question here... Yes :-) The slab allocator will give me a nice piece of memory, but I will still need to turn that into a valid mm_struct. I can't call alloc_mm() and friends because they're not EXPORTed. Would this work: atomically increment current->mm->users and save the value of current->mm before calling daemonize(), then re-assign the old value back to current->mm afterwards? > On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote: > > 2. What's the proper way for a kernel thread to allocate a region of > > userspace memory? > > Hmm. Sounds like you want to grab a user address space and do userspace > stuff inside there. Maybe avoid do_execve() etc. and call sys_*() for > everything else outright? The question itself probably wants sys_mmap() > or some such, or handle_mm_fault() depending on what you have in mind > for allocation. sys_mmap() or something along those lines would be good. But I can't call it directly because 2.6 doesn't EXPORT the sys_xxx functions. Also, I'm not clear on whether mmap() lets you create an anonymous mapping -- one backed by swap space rather than a file -- that's what I would want to do. > On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote: > > 3. What's the proper way to invalidate all entries in the page cache > > that refer to a particular file? > > invalidate_inode_pages(). Great! I'll search through the kernel code for it. Alan Stern -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 17 Sep 2003 15:44:53 -0700 From: William Lee Irwin III Subject: Re: How best to bypass the page cache from within a kernel module? Message-ID: <20030917224453.GM14079@holomorphy.com> References: <20030917204047.GI14079@holomorphy.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org Return-Path: To: Alan Stern Cc: Dave Hansen , linux-mm List-ID: On Wed, 17 Sep 2003, William Lee Irwin III wrote: >> Well, you can get one from the slab allocator, though I expect there will >> be a followup question here... On Wed, Sep 17, 2003 at 05:30:50PM -0400, Alan Stern wrote: > Yes :-) The slab allocator will give me a nice piece of memory, but I > will still need to turn that into a valid mm_struct. I can't call > alloc_mm() and friends because they're not EXPORTed. Well, alloc_mm() doesn't really do much, so it should be easily preppable along the same lines if it absolutely has to be a module. In truth, the mm slab should be using a ctor (the vma slab too). On Wed, 17 Sep 2003, William Lee Irwin III wrote: >> Hmm. Sounds like you want to grab a user address space and do userspace >> stuff inside there. Maybe avoid do_execve() etc. and call sys_*() for >> everything else outright? The question itself probably wants sys_mmap() >> or some such, or handle_mm_fault() depending on what you have in mind >> for allocation. On Wed, Sep 17, 2003 at 05:30:50PM -0400, Alan Stern wrote: > sys_mmap() or something along those lines would be good. But I can't call > it directly because 2.6 doesn't EXPORT the sys_xxx functions. Also, I'm > not clear on whether mmap() lets you create an anonymous mapping -- one > backed by swap space rather than a file -- that's what I would want to do. That's a pain. It's probably easier to just compile the driver in, then. On Wed, 17 Sep 2003, William Lee Irwin III wrote: >> invalidate_inode_pages(). On Wed, Sep 17, 2003 at 05:30:50PM -0400, Alan Stern wrote: > Great! I'll search through the kernel code for it. Should be in mm/filemap.c -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <3F6F44AF.2030807@sgi.com> Date: Mon, 22 Sep 2003 13:51:27 -0500 From: Ray Bryant MIME-Version: 1.0 Subject: Re: How best to bypass the page cache from within a kernel module? References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Alan Stern Cc: linux-mm@kvack.org List-ID: Alan Stern wrote: > I'm working on a kernel module driver for Linux 2.6. One of the things > this driver needs to do is perform a VERIFY command; which means checking > to make sure that certain disk sectors within a file actually can be read > without encountering a bad sector or other hardware error. Now, I realize > that there are already issues involved with convincing the disk drive to > read from its media rather than from its cache. But apart from that, my > problem is how to convince Linux to read from the drive rather than from > the page cache. > > One suggestion was to use O_DIRECT when opening the file, because that > does cause reads to go directly to the hardware. The problem with this is > that since the direct-I/O routines send file data directly to user > buffers, they must check that the buffer addresses are valid and belong to > the user's address space. But my code runs in a kernel thread so it has > no current->mm (and in any case I would prefer to use my kernel-space > buffers rather than user-space memory). It might be possible to get hold > of an mm_struct, but it's not necessarily easy as mm_alloc() isn't > EXPORTed. Perhaps my thread could keep its original current->mm by > incrementing current->mm->users before calling daemonize() and setting > current->mm back to its original value afterward. Is that legal? Having > done so, perhaps I could use some sort of mmap() call to allocate a > user-space buffer that would be okay for direct-I/O. What's the best way > to do that -- what function would I have to call? > > However, all that seems rather roundabout. An equally acceptable solution > would be simply to invalidate all the entries in the page cache referring > to my file, so that reads would be forced to go to the drive. Can anyone > tell me how to do that? Take a look at invalidate_inode_pages().... > > TIA, > > Alan Stern > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: aart@kvack.org > -- Best Regards, Ray ----------------------------------------------- Ray Bryant 512-453-9679 (work) 512-507-7807 (cell) raybry@sgi.com raybry@austin.rr.com The box said: "Requires Windows 98 or better", so I installed Linux. ----------------------------------------------- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 22 Sep 2003 15:09:33 -0400 (EDT) From: Alan Stern Subject: Re: How best to bypass the page cache from within a kernel module? In-Reply-To: <3F6F44AF.2030807@sgi.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Ray Bryant Cc: linux-mm@kvack.org List-ID: On Mon, 22 Sep 2003, Ray Bryant wrote: > Take a look at invalidate_inode_pages().... William Lee Irwin made the same suggestion. It turned out to be just what I needed. Thanks, guys! Alan Stern -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org