* get_fs_excl/put_fs_excl/has_fs_excl
@ 2009-04-23 19:18 Christoph Hellwig
2009-04-23 19:21 ` get_fs_excl/put_fs_excl/has_fs_excl Jens Axboe
0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2009-04-23 19:18 UTC (permalink / raw)
To: axboe; +Cc: linux-fsdevel, linux-kernel
Stumbled over these gems recently when investigating the
lock_super/unlock_super removal.
These were added in commit 22e2c507c301c3dbbcf91b4948b88f78842ee6c9
[PATCH] Update cfq io scheduler to time sliced design
which unfortunately doesn't contain any comments about it. It seems to
be used to allow boosting priority for some sort of central fs metadata
updates, at least what the usage in the reiserfs journal code
looks like that.
Do you happen to have some notes/anecdotes about it so that we can
document it, give it saner naming and use it directly in the
spots that need it (including inside xfs, btrfs, etc) instead of lock_super?
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: get_fs_excl/put_fs_excl/has_fs_excl 2009-04-23 19:18 get_fs_excl/put_fs_excl/has_fs_excl Christoph Hellwig @ 2009-04-23 19:21 ` Jens Axboe 2009-04-23 21:23 ` get_fs_excl/put_fs_excl/has_fs_excl Jamie Lokier 2009-04-24 18:40 ` get_fs_excl/put_fs_excl/has_fs_excl Christoph Hellwig 0 siblings, 2 replies; 11+ messages in thread From: Jens Axboe @ 2009-04-23 19:21 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-fsdevel, linux-kernel On Thu, Apr 23 2009, Christoph Hellwig wrote: > Stumbled over these gems recently when investigating the > lock_super/unlock_super removal. > > These were added in commit 22e2c507c301c3dbbcf91b4948b88f78842ee6c9 > > [PATCH] Update cfq io scheduler to time sliced design > > which unfortunately doesn't contain any comments about it. It seems to > be used to allow boosting priority for some sort of central fs metadata > updates, at least what the usage in the reiserfs journal code > looks like that. > > Do you happen to have some notes/anecdotes about it so that we can > document it, give it saner naming and use it directly in the > spots that need it (including inside xfs, btrfs, etc) instead of lock_super? The intent was to add some sort of notification mechanism from the file system to inform the IO scheduler (and others?) that this process is how holding a file system wide resource. So if you have a low priority process getting access to such a resource, you want to boost its priority to avoid higher priority apps getting stuck beind it. Sort of a poor mans priority inheritance. It would be wonderful if you could kick this process more into gear on the fs side... -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_fs_excl/put_fs_excl/has_fs_excl 2009-04-23 19:21 ` get_fs_excl/put_fs_excl/has_fs_excl Jens Axboe @ 2009-04-23 21:23 ` Jamie Lokier 2009-04-24 5:58 ` get_fs_excl/put_fs_excl/has_fs_excl Jens Axboe 2009-04-24 18:40 ` get_fs_excl/put_fs_excl/has_fs_excl Christoph Hellwig 1 sibling, 1 reply; 11+ messages in thread From: Jamie Lokier @ 2009-04-23 21:23 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, linux-fsdevel, linux-kernel Jens Axboe wrote: > The intent was to add some sort of notification mechanism from the file > system to inform the IO scheduler (and others?) that this process is how > holding a file system wide resource. So if you have a low priority > process getting access to such a resource, you want to boost its > priority to avoid higher priority apps getting stuck beind it. Sort of a > poor mans priority inheritance. Very closely related to this: I'm building something where I want one particular task to have absolute higher I/O priority than all other tasks. No problem, use the lovely RT I/O priority facility. But if that task needs access to a buffer or page which is already undergoing I/O started by another task - what happens? I'd like the _I/O_ priority to be boosted in that case, so that the high priority task does not have to wait on a long queue of low priority I/Os. E.g. this happens when the high priority task reads from a file, and a low priority task has already initiated readahead for that file. It's a particular problem if the low priority task's I/O is queued behind a lot of other low priority I/O. That can be avoided by just not reading the same files :-) But more subtly, the high priority task may find itself waiting on metadata blocks which overlap metadata blocks from I/O in a low priority tasks. The application can't easily avoid this. So I'd like operations which wait for I/O to complete to compare the task's I/O priority with the I/O request already queued, and boost the request priority if it's lower, moving it forward in the elevator if necessary. All this to guarantee a high I/O priority task has a maximum response time no matter what low priority I/O is doing. Even O_DIRECT has to read metadata sometimes... It seems if I/O priority boosting were implemented like this, that might solve the superblock priority thing too, without needing filesystem changes and generically for all metadata? How hard would it be to do this? Thanks, -- Jamie ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_fs_excl/put_fs_excl/has_fs_excl 2009-04-23 21:23 ` get_fs_excl/put_fs_excl/has_fs_excl Jamie Lokier @ 2009-04-24 5:58 ` Jens Axboe 0 siblings, 0 replies; 11+ messages in thread From: Jens Axboe @ 2009-04-24 5:58 UTC (permalink / raw) To: Jamie Lokier; +Cc: Christoph Hellwig, linux-fsdevel, linux-kernel On Thu, Apr 23 2009, Jamie Lokier wrote: > Jens Axboe wrote: > > The intent was to add some sort of notification mechanism from the file > > system to inform the IO scheduler (and others?) that this process is how > > holding a file system wide resource. So if you have a low priority > > process getting access to such a resource, you want to boost its > > priority to avoid higher priority apps getting stuck beind it. Sort of a > > poor mans priority inheritance. > > Very closely related to this: I'm building something where I want one > particular task to have absolute higher I/O priority than all other > tasks. No problem, use the lovely RT I/O priority facility. > > But if that task needs access to a buffer or page which is already > undergoing I/O started by another task - what happens? I'd like the > _I/O_ priority to be boosted in that case, so that the high priority > task does not have to wait on a long queue of low priority I/Os. > > E.g. this happens when the high priority task reads from a file, and a > low priority task has already initiated readahead for that file. It's > a particular problem if the low priority task's I/O is queued behind a > lot of other low priority I/O. > > That can be avoided by just not reading the same files :-) But more > subtly, the high priority task may find itself waiting on metadata > blocks which overlap metadata blocks from I/O in a low priority tasks. > The application can't easily avoid this. > > So I'd like operations which wait for I/O to complete to compare the > task's I/O priority with the I/O request already queued, and boost the > request priority if it's lower, moving it forward in the elevator if > necessary. > > All this to guarantee a high I/O priority task has a maximum response > time no matter what low priority I/O is doing. Even O_DIRECT has to > read metadata sometimes... So presumably both the RT and normal task end up doing lock_page() on the same page. Then __wait_on_bit_lock() uses prepare_to_wait_exclusive() on the wait queue, which does FIFO ordering of tasks. When IO completes, the first waiter is woken up. If the wait queue was sorted by process priority, then lock_page() would honor the task priority and make sure that the highest prio task got woken first. > It seems if I/O priority boosting were implemented like this, that > might solve the superblock priority thing too, without needing > filesystem changes and generically for all metadata? It's a different situation, one is waiting for some resource (the page) to become available by being read in, so it's waiting for IO. The other is holding some shared resource and then performing IO, potentially waiting for that IO. In the latter case, the RT (or just higher) priority task can't get access to the shared resource, so we can't do much more than simply expedite the IO of the lower priority task. The former case COULD be solved with prioritized wait queues. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_fs_excl/put_fs_excl/has_fs_excl 2009-04-23 19:21 ` get_fs_excl/put_fs_excl/has_fs_excl Jens Axboe 2009-04-23 21:23 ` get_fs_excl/put_fs_excl/has_fs_excl Jamie Lokier @ 2009-04-24 18:40 ` Christoph Hellwig 2009-04-25 15:16 ` get_fs_excl/put_fs_excl/has_fs_excl Theodore Tso 1 sibling, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2009-04-24 18:40 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, linux-fsdevel, linux-kernel On Thu, Apr 23, 2009 at 09:21:24PM +0200, Jens Axboe wrote: > The intent was to add some sort of notification mechanism from the file > system to inform the IO scheduler (and others?) that this process is how > holding a file system wide resource. So if you have a low priority > process getting access to such a resource, you want to boost its > priority to avoid higher priority apps getting stuck beind it. Sort of a > poor mans priority inheritance. > > It would be wonderful if you could kick this process more into gear on > the fs side... So what are the calls in lock_super/unlock_super supposed to be for? ->write_super? While that can sync bits out most of the heavy lifting is now done in ->sync_fs for most filesystems. ->remount_fs? This is going to block all other I/O anyway. ->put_super? Surely not :) ext3/4 internal bits? Doesn't seem to be used for any journal related activity but mostly as protection against resizing (the whole lock_super usage in ext3/4 looks odd to me, interestingly there's none at all in ext2. Maybe someone of the extN crowd should audit and get rid of it in favour of a better fs-specific lock) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_fs_excl/put_fs_excl/has_fs_excl 2009-04-24 18:40 ` get_fs_excl/put_fs_excl/has_fs_excl Christoph Hellwig @ 2009-04-25 15:16 ` Theodore Tso 2009-04-27 9:53 ` get_fs_excl/put_fs_excl/has_fs_excl Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: Theodore Tso @ 2009-04-25 15:16 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, linux-fsdevel, linux-kernel, linux-ext4 On Fri, Apr 24, 2009 at 08:40:47PM +0200, Christoph Hellwig wrote: > On Thu, Apr 23, 2009 at 09:21:24PM +0200, Jens Axboe wrote: > > The intent was to add some sort of notification mechanism from the file > > system to inform the IO scheduler (and others?) that this process is how > > holding a file system wide resource. So if you have a low priority > > process getting access to such a resource, you want to boost its > > priority to avoid higher priority apps getting stuck beind it. Sort of a > > poor mans priority inheritance. > > > > It would be wonderful if you could kick this process more into gear on > > the fs side... I have to agree with Christoph; it would be nice if this were actually documented somewhere. Filesystem authors can't do something if they don't understand what the semantics are and how it is supposed to be used! I'm kind of curious why you implemented things in this way, though. Is there a reason why the bosting is happening deep in the guts of the cfq code, instead of in blk-core.c when the submission of the block I/O request is processed? > So what are the calls in lock_super/unlock_super supposed to be for? > ->write_super? While that can sync bits out most of the heavy lifting > is now done in ->sync_fs for most filesystems. ->remount_fs? This is > going to block all other I/O anyway. ->put_super? Surely not :) > > ext3/4 internal bits? Doesn't seem to be used for any journal related > activity but mostly as protection against resizing (the whole lock_super > usage in ext3/4 looks odd to me, interestingly there's none at all in > ext2. Maybe someone of the extN crowd should audit and get rid of it in > favour of a better fs-specific lock) Yeah, the use of lock_super is definitely very funny in ext3/4. There seems to be 3 primary usages; one is blocking write_super(), although I'm not entirely sure that's needed in all of the places where we do it. Another is in protecting the orphan list handling; and the final one seems to be in the resizing handling. The last seems... interesting, especially given this comment: /* * We need to protect s_groups_count against other CPUs seeing * inconsistent state in the superblock. * * The precise rules we use are: * * * Writers of s_groups_count *must* hold lock_super * AND * * Writers must perform a smp_wmb() after updating all dependent * data and before modifying the groups count * * * Readers must hold lock_super() over the access * OR * * Readers must perform an smp_rmb() after reading the groups count * and before reading any dependent data. * * NB. These rules can be relaxed when checking the group count * while freeing data, as we can only allocate from a block * group after serialising against the group count, and we can * only then free after serialising in turn against that * allocation. */ ... but mballoc.c appears not to follow the above protocol at all, as it relates to using smp_rmb() --- although balloc.c does. Fortunately resizes don't happen all that often, but there is definitely some scary potential problems hiding here, I suspect. - Ted ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_fs_excl/put_fs_excl/has_fs_excl 2009-04-25 15:16 ` get_fs_excl/put_fs_excl/has_fs_excl Theodore Tso @ 2009-04-27 9:53 ` Jens Axboe 2009-04-27 11:33 ` get_fs_excl/put_fs_excl/has_fs_excl Theodore Tso 0 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2009-04-27 9:53 UTC (permalink / raw) To: Theodore Tso; +Cc: Christoph Hellwig, linux-fsdevel, linux-kernel, linux-ext4 On Sat, Apr 25 2009, Theodore Tso wrote: > On Fri, Apr 24, 2009 at 08:40:47PM +0200, Christoph Hellwig wrote: > > On Thu, Apr 23, 2009 at 09:21:24PM +0200, Jens Axboe wrote: > > > The intent was to add some sort of notification mechanism from the file > > > system to inform the IO scheduler (and others?) that this process is how > > > holding a file system wide resource. So if you have a low priority > > > process getting access to such a resource, you want to boost its > > > priority to avoid higher priority apps getting stuck beind it. Sort of a > > > poor mans priority inheritance. > > > > > > It would be wonderful if you could kick this process more into gear on > > > the fs side... > > I have to agree with Christoph; it would be nice if this were actually > documented somewhere. Filesystem authors can't do something if they > don't understand what the semantics are and how it is supposed to be > used! I don't disagree, the project (unfortunately) never really went anywhere. THe half-assed implementation was meant to be picked up by fs people. I guess that's what is happening now, so it's a belated success :-) > I'm kind of curious why you implemented things in this way, though. > Is there a reason why the bosting is happening deep in the guts of the > cfq code, instead of in blk-core.c when the submission of the block > I/O request is processed? You would need to implement a lot more logic in the block layer to handle it there, as it stands it's basically a scheduler decision. So the positioning is right imho, the placement of fs hooks is probably mostly crap and could do with some work. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_fs_excl/put_fs_excl/has_fs_excl 2009-04-27 9:53 ` get_fs_excl/put_fs_excl/has_fs_excl Jens Axboe @ 2009-04-27 11:33 ` Theodore Tso 2009-04-27 14:47 ` get_fs_excl/put_fs_excl/has_fs_excl Jamie Lokier 0 siblings, 1 reply; 11+ messages in thread From: Theodore Tso @ 2009-04-27 11:33 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, linux-fsdevel, linux-kernel, linux-ext4 On Mon, Apr 27, 2009 at 11:53:39AM +0200, Jens Axboe wrote: > > I'm kind of curious why you implemented things in this way, though. > > Is there a reason why the bosting is happening deep in the guts of the > > cfq code, instead of in blk-core.c when the submission of the block > > I/O request is processed? > > You would need to implement a lot more logic in the block layer to > handle it there, as it stands it's basically a scheduler decision. So > the positioning is right imho, the placement of fs hooks is probably > mostly crap and could do with some work. The question is whether you see this in terms of a scheduler decision or in terms of an I/O priority issue. At the moment I agree it's a scheduler decision (which to be honest is implemented in somewhat of a hacky way --- which I suspect won't bother you since, you yourself called it "half-assed" :-) which happens to be implemented in the I/O scheduler. I tend to think of it more as an I/O priority issue, and specifically, as you put it, an priority inversion issue, but much of that is no doubt influenced by how I did the patches to reduce the fsync() latencies in ext3 and ext4. And indeed the get_fs_excl()/put_fs_excl() paradigm doesn't really work well for ext3/ext4 since all of the work which grabs a filesystem-wide "exclusive lock" is done in a separate process, kjournald. Hence with the exception of freeze and unfreeze --- and while this might be considered irresponsible for a system administrator to freeze a filesystem in a ionice'd process, I could imagine a badly written backup script which created a snapshot while being ionice'd --- ext3/4 can't really very profitably use get_fs_excl()/put_fs_excl(). Maybe ext3/ext4 are a special case, but perhaps we should nevertheless ask some fundamental design questions about the get/put_fs_excl() interface. *) Most filesystems will go to great lengths to avoid having any kind of fs-wide "exclusive lock", simply because of the disastrous performance impacts. This is *why* in ext3/ext4, we try to do most of the commit work in the context of another process, and normally usually we let other filesystem operations run in the "current transaction" while we let the "committing transaction" complete. If you have too many programs running fsync() this tends to screw things up, but that's a separate question. So in practice, there really shouldn't be that many "fs-wide" locks. On the other hand, there can be more subtle forms of I/O priority inversion; suppose a low priority process has grabbed a mutex which protects a directory, and a high (I/O) priority process needs access to the same directory. Do we care about trying to solve that issue? *) Do we only want to support instances where the fs-wide resource is held in kernel-space only, or do we want to support things like the FREEZE ioctl, where the filesystem has been frozen --- the very definition of an I/O wide resource? (I would argue no, for simplicity's sake but document the fact that the well-written program using the FREEZE ioctl should strongly consider bumping up its I/O and possibly CPU priority levels to minimize the impact on the rest of the system. Since the FREEZE ioctl requires root privileges, it's fair to assume a certain amount of competence by the users of this interface.) If the answer to this question is no, then we can add warning/debugging code which warns if the filesystem ever tries returning to userspace with an elevated get_fs_excl() count. *) Do we only care about processes whose I/O priority is below the default? (i.e., either in the idle class, or in a low-priority best efforts class) What if the concern is a real-time process which is being blocked by a default I/O priority process taking its time while holding some fs-wide resource? If the answer to the previous question is no, it becomes more reasonable to consider bump the submission priority of the process in question to the highest priority "best efforts" level. After all, if this truly is a "filesystem-wide" resource, then no one is going to make forward progress relating to this block device unless and until the filesystem-wide lock is resolved. Also, if we don't allow this situation to return to userspace, presumably the kernel-code involved will only be writing to the block-device in question. (This might not be entirely true if in the case of the sendfile(2) syscall, but currently we can only read from filesystems with sendfile, and so presumably a filesystem would never call get_fs_excl why servicing a sendfile request.) *) Is implementing the bulk of this in the cfq scheduler really the best place to do this? To explore something completely different, what if the filesystem simply explicitly set I/O priority levels in its block I/O submissions, and provided optional callback functions which could be used by the page writeback routines to determine the appropriate I/O priority level that should be used given a particular filesystem and inode number. (That actually could be used to provide another cool function --- we could expose to userspace the concept that particular inode should always have its I/O go out with a higher priority, perhaps via chattr flag.) Basically, the argument here is that we already have the appropriate mechanism for ordering I/O requests, which is I/O priority mechanism, and the policy really needs to be set by the filesystem --- and it might be far more than just "do we have a filesystem-wide exclusive lock" or not. What do other filesystem developers think? - Ted ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_fs_excl/put_fs_excl/has_fs_excl 2009-04-27 11:33 ` get_fs_excl/put_fs_excl/has_fs_excl Theodore Tso @ 2009-04-27 14:47 ` Jamie Lokier 2009-04-27 16:29 ` get_fs_excl/put_fs_excl/has_fs_excl Theodore Tso 0 siblings, 1 reply; 11+ messages in thread From: Jamie Lokier @ 2009-04-27 14:47 UTC (permalink / raw) To: Theodore Tso, Jens Axboe, Christoph Hellwig, linux-fsdevel, linux-kernel, linux-ext4 Theodore Tso wrote: > *) Do we only care about processes whose I/O priority is below the > default? (i.e., either in the idle class, or in a low-priority > best efforts class) What if the concern is a real-time process > which is being blocked by a default I/O priority process taking its > time while holding some fs-wide resource? > > If the answer to the previous question is no, it becomes more > reasonable to consider bump the submission priority of the process > in question to the highest priority "best efforts" level. After > all, if this truly is a "filesystem-wide" resource, then no one is > going to make forward progress relating to this block device unless > and until the filesystem-wide lock is resolved. Also, if we don't > allow this situation to return to userspace, presumably the > kernel-code involved will only be writing to the block-device in > question. (This might not be entirely true if in the case of the > sendfile(2) syscall, but currently we can only read from > filesystems with sendfile, and so presumably a filesystem would > never call get_fs_excl why servicing a sendfile request.) > > *) Is implementing the bulk of this in the cfq scheduler really the > best place to do this? To explore something completely different, > what if the filesystem simply explicitly set I/O priority levels in > its block I/O submissions, and provided optional callback functions > which could be used by the page writeback routines to determine the > appropriate I/O priority level that should be used given a > particular filesystem and inode number. (That actually could be > used to provide another cool function --- we could expose to > userspace the concept that particular inode should always have its > I/O go out with a higher priority, perhaps via chattr flag.) > > Basically, the argument here is that we already have the > appropriate mechanism for ordering I/O requests, which is I/O > priority mechanism, and the policy really needs to be set by the > filesystem --- and it might be far more than just "do we have a > filesystem-wide exclusive lock" or not. Personally, I'm interested in the following: - A process with RT I/O priority and RT CPU priority is reading a series of files from disk. It should be very reliable at this. - Other normal I/O priority and normal CPU priority processes are reading and writing the disk. I would like the first process to have a guaranteed minimum I/O performance: it should continuously make progress, even when it needs to read some file metadata which overlaps a page affected by the other processes. I don't mind all the interference from disk head seeks and so on, but I would like the I/O that the first process depends on to have RT I/O priority - including when it's waiting on I/O initiated by another process and the normal I/O priority queue is full. So, I'm not exactly sure, but I think what I need for that is: - I/O priority boosting (re-queuing in the elevator) to fix the inversion when waiting on I/O which was previously queued with normal I/O priority, and - Task priority boosting when waiting on a filesystem resource which is held by a normal priority task. (I'm not sure if generic task priority boosting is already addressed to some extent in the RT-PREEMPT Linux tree.) -- Jamie ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_fs_excl/put_fs_excl/has_fs_excl 2009-04-27 14:47 ` get_fs_excl/put_fs_excl/has_fs_excl Jamie Lokier @ 2009-04-27 16:29 ` Theodore Tso 2009-04-27 17:03 ` get_fs_excl/put_fs_excl/has_fs_excl Jamie Lokier 0 siblings, 1 reply; 11+ messages in thread From: Theodore Tso @ 2009-04-27 16:29 UTC (permalink / raw) To: Jamie Lokier Cc: Jens Axboe, Christoph Hellwig, linux-fsdevel, linux-kernel, linux-ext4 On Mon, Apr 27, 2009 at 03:47:42PM +0100, Jamie Lokier wrote: > Personally, I'm interested in the following: > > - A process with RT I/O priority and RT CPU priority is reading > a series of files from disk. It should be very reliable at this. > > - Other normal I/O priority and normal CPU priority processes are > reading and writing the disk. > > I would like the first process to have a guaranteed minimum I/O > performance: it should continuously make progress, even when it needs > to read some file metadata which overlaps a page affected by the other > processes. That's pretty easy. The much harder and much more interesting problem is if the process with RT I/O and CPU priority is *writing* a series of files to disk, and not just reading from disk. > I don't mind all the interference from disk head seeks and > so on, but I would like the I/O that the first process depends on to > have RT I/O priority - including when it's waiting on I/O initiated by > another process and the normal I/O priority queue is full. > > So, I'm not exactly sure, but I think what I need for that is: > > - I/O priority boosting (re-queuing in the elevator) to fix the > inversion when waiting on I/O which was previously queued with > normal I/O priority, and > > - Task priority boosting when waiting on a filesystem resource > which is held by a normal priority task. For the latter, I can't think of a filesystem where we would block a read operation for long time just because someone was holding some kind of filesytem-wide lock. A spinlock, maybe, but the only time it makes sense to worry about boosting an I/O priority is if we're going to be blocing a filesystem for milliseconds or more, and not just a few tens of microseconds. All of the latency problems people have been complaining about, such as the infamous firefox fsync() problem, all involved write operations, and specifically fsync(), and maybe a heavy read-workload interfered with a write, but I can't think of a situation where a real-time read operation would be disrupted by normal priority reads and writes. For the former, where a real-time read request gets blocked because the read request for that block had already been submitted --- at a lower priority --- that's something that should be solvable purely in core block layer and in the I/O scheduler layer, I would expect. - Ted ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: get_fs_excl/put_fs_excl/has_fs_excl 2009-04-27 16:29 ` get_fs_excl/put_fs_excl/has_fs_excl Theodore Tso @ 2009-04-27 17:03 ` Jamie Lokier 0 siblings, 0 replies; 11+ messages in thread From: Jamie Lokier @ 2009-04-27 17:03 UTC (permalink / raw) To: Theodore Tso, Jens Axboe, Christoph Hellwig, linux-fsdevel, linux-kernel, linux-ext4 Theodore Tso wrote: > On Mon, Apr 27, 2009 at 03:47:42PM +0100, Jamie Lokier wrote: > > Personally, I'm interested in the following: > > > > - A process with RT I/O priority and RT CPU priority is reading > > a series of files from disk. It should be very reliable at this. > > > > - Other normal I/O priority and normal CPU priority processes are > > reading and writing the disk. > > > > I would like the first process to have a guaranteed minimum I/O > > performance: it should continuously make progress, even when it needs > > to read some file metadata which overlaps a page affected by the other > > processes. > > That's pretty easy. The much harder and much more interesting problem > is if the process with RT I/O and CPU priority is *writing* a series > of files to disk, and not just reading from disk. ... > I can't think of a filesystem where we would block a > read operation for long time just because someone was holding some > kind of filesytem-wide lock. A spinlock, maybe, but the only time it > makes sense to worry about boosting an I/O priority is if we're going > to be blocing a filesystem for milliseconds or more, and not just a > few tens of microseconds. ... > For the former, where a real-time read request gets blocked because > the read request for that block had already been submitted --- at a > lower priority --- that's something that should be solvable purely in > core block layer and in the I/O scheduler layer, I would expect. That's great to know, thanks. I will poke at the block layer and I/O scheduler then, see where it leads. Thanks, -- Jamie ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-04-27 17:03 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-23 19:18 get_fs_excl/put_fs_excl/has_fs_excl Christoph Hellwig 2009-04-23 19:21 ` get_fs_excl/put_fs_excl/has_fs_excl Jens Axboe 2009-04-23 21:23 ` get_fs_excl/put_fs_excl/has_fs_excl Jamie Lokier 2009-04-24 5:58 ` get_fs_excl/put_fs_excl/has_fs_excl Jens Axboe 2009-04-24 18:40 ` get_fs_excl/put_fs_excl/has_fs_excl Christoph Hellwig 2009-04-25 15:16 ` get_fs_excl/put_fs_excl/has_fs_excl Theodore Tso 2009-04-27 9:53 ` get_fs_excl/put_fs_excl/has_fs_excl Jens Axboe 2009-04-27 11:33 ` get_fs_excl/put_fs_excl/has_fs_excl Theodore Tso 2009-04-27 14:47 ` get_fs_excl/put_fs_excl/has_fs_excl Jamie Lokier 2009-04-27 16:29 ` get_fs_excl/put_fs_excl/has_fs_excl Theodore Tso 2009-04-27 17:03 ` get_fs_excl/put_fs_excl/has_fs_excl Jamie Lokier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).