* One more bio for for floppy users in 2.5.33..
@ 2002-09-03 18:00 Linus Torvalds
2002-09-03 18:02 ` Jens Axboe
2002-09-03 20:57 ` Mikael Pettersson
0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2002-09-03 18:00 UTC (permalink / raw)
To: Kernel Mailing List; +Cc: Jens Axboe
Ok,
I found another major bio-related bug that definitely explains why the
floppy driver generated corruption - any partial request completion would
be totally messed up by the BIO layer (not the floppy driver).
Any other block device driver that did partial request completion might
also be impacted.
I'm still looking at the floppy driver itself - some of the request
completion code is so messed up as to be unreadable, and some of that may
actually be due to trying to work around the bio problem (unsuccessfully,
I may add). So this may not actually fix things for you yet, simply
because the floppy driver itself still does some strange things.
Jens, oops. We should not update the counts by how much was left
uncompleted, but by how much we successfully completed!
Linus
----
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/09/03 torvalds@home.transmeta.com 1.582
# Major partial request completion boo-boo in the bio layer.
#
# This was _bad_. Major floppy corruption, and possibly the reason
# for other block device corruption for any driver that generated
# partial results for a block device request.
# --------------------------------------------
#
diff -Nru a/drivers/block/ll_rw_blk.c b/drivers/block/ll_rw_blk.c
--- a/drivers/block/ll_rw_blk.c Tue Sep 3 10:53:59 2002
+++ b/drivers/block/ll_rw_blk.c Tue Sep 3 10:53:59 2002
@@ -1992,11 +1992,11 @@
* not a complete bvec done
*/
if (unlikely(nsect > nr_sectors)) {
- int residual = (nsect - nr_sectors) << 9;
+ int partial = nr_sectors << 9;
- bio->bi_size -= residual;
- bio_iovec(bio)->bv_offset += residual;
- bio_iovec(bio)->bv_len -= residual;
+ bio->bi_size -= partial;
+ bio_iovec(bio)->bv_offset += partial;
+ bio_iovec(bio)->bv_len -= partial;
blk_recalc_rq_sectors(req, nr_sectors);
blk_recalc_rq_segments(req);
return 1;
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: One more bio for for floppy users in 2.5.33.. 2002-09-03 18:00 One more bio for for floppy users in 2.5.33 Linus Torvalds @ 2002-09-03 18:02 ` Jens Axboe 2002-09-04 7:25 ` Suparna Bhattacharya 2002-09-03 20:57 ` Mikael Pettersson 1 sibling, 1 reply; 30+ messages in thread From: Jens Axboe @ 2002-09-03 18:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Tue, Sep 03 2002, Linus Torvalds wrote: > > Ok, > I found another major bio-related bug that definitely explains why the > floppy driver generated corruption - any partial request completion would > be totally messed up by the BIO layer (not the floppy driver). > > Any other block device driver that did partial request completion might > also be impacted. > > I'm still looking at the floppy driver itself - some of the request > completion code is so messed up as to be unreadable, and some of that may > actually be due to trying to work around the bio problem (unsuccessfully, > I may add). So this may not actually fix things for you yet, simply > because the floppy driver itself still does some strange things. > > Jens, oops. We should not update the counts by how much was left > uncompleted, but by how much we successfully completed! Yeah oops, the most embarassing thing is that Bart and I have both found this but independently months ago but it seems it got lost at my end (or your end, but lets not point fingers :-) :-( Patch is ofcourse correct. I'm not sure other drivers have been hit (of the used ones), since they would have to use old style completions and do less than current_nr_sectors in one-go. -- Jens Axboe ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-03 18:02 ` Jens Axboe @ 2002-09-04 7:25 ` Suparna Bhattacharya 2002-09-04 16:36 ` Linus Torvalds 0 siblings, 1 reply; 30+ messages in thread From: Suparna Bhattacharya @ 2002-09-04 7:25 UTC (permalink / raw) To: Linus Torvalds, Jens Axboe, linux-kernel On Tue, 03 Sep 2002 23:44:59 +0530, Jens Axboe wrote: > On Tue, Sep 03 2002, Linus Torvalds wrote: >> >> Ok, >> I found another major bio-related bug that definitely explains why the >> floppy driver generated corruption - any partial request completion >> would be totally messed up by the BIO layer (not the floppy driver). >> >> Any other block device driver that did partial request completion might >> also be impacted. >> >> Jens, oops. We should not update the counts by how much was left >> uncompleted, but by how much we successfully completed! > > Yeah oops, the most embarassing thing is that Bart and I have both found > this but independently months ago but it seems it got lost at my end (or > your end, but lets not point fingers :-) :-( > Oh yes, even I had this fixed this in the bio traversal patches I had posted (had this in the core patch, and mentioned it in the description in the note :) ), guess it went unnoticed. BTW, any plans for including those patches ? So far all feedback I've received (including Jens, Bart, James, Andrew ) seems to say OK to go from what I can tell. If it seems like the right thing to do, then I'd rather it go in sooner (at least the core and comatibility portions), so subsequent driver changes etc are aligned with this (I've been chasing several kernel versions with this now :( ) I now have a latest version updated to 2.5.33, but dropped the IDE portions from it for now, in view of the ide switch and ongoing changes ... If Bart upgrades his corresponding patches then that would take care of it. Otherwise I could give that a go too you think that seems worthwhile and may relook at whether the "traverse for submission" helpers I have can be improved upon (maybe given better names too). A quick recap: BIO traversal enhancements -------------------------- - Pre-req for full BIO splitting infrastructure [Splits in the middle of a segment can also share the same vec] - Pre-req for attaching a pre-built vector (not owned by block) to a bio (e.g for aio) [Avoids certain subtle side-effects in special cases like partial segment completion] - Pre-req for IDE mult-count PIO write fixes w/o request copy [Ability to traverse a bio partially for i/o submission without effecting bio completion] Introduces some new fields in the bio and request to help maintain traversal state and handle partial segment bio clones, and comes with a corresponding set of helpers to enable clean separation of traversal for i/o submission vs i/o completion. Regards Suparna > Patch is ofcourse correct. I'm not sure other drivers have been hit (of > the used ones), since they would have to use old style completions and > do less than current_nr_sectors in one-go. > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-04 7:25 ` Suparna Bhattacharya @ 2002-09-04 16:36 ` Linus Torvalds 2002-09-05 7:03 ` Suparna Bhattacharya 0 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2002-09-04 16:36 UTC (permalink / raw) To: Suparna Bhattacharya; +Cc: Jens Axboe, linux-kernel On Wed, 4 Sep 2002, Suparna Bhattacharya wrote: > > Oh yes, even I had this fixed this in the bio traversal patches > I had posted (had this in the core patch, and mentioned it > in the description in the note :) ), guess it went unnoticed. Well, I've never seen a "this should go in" about it. Also, it was apparently mixed up with the "bio splitup" stuff, which was discussed at least with Jens, and I feel strongly that we shouldn't split, we should build up. Jens was working on exactly that. In other words, I absolutely hate the fact that a major bug-fix was (a) not marked as such and sent to me and (b) mixed up with experimental work for other drivers Even now (assuming I hadn't fixed it on my own), I would have preferred to get that fix separately, as it would have impacted the floppy driver, for example (the fix broke the floppy driver even more than it was before, because the floppy driver was stupidly trying to work around the original bug by hand). Imagine what a horror it is to figure out why a large experimental patch breaks an existing driver? My first reaction would have been to just throw the large new patch away, since it obviously broke the floppy even more. Instead, if I had been passed the bug-fix only, and the floppy had broken worse that it was originally, it would have been absolutely _obvious_ where the problem was. In short: please please PLEASE keep fixes to existing code separate from new stuff. It makes everything easier, and I have absolutely no problems with applying "obvious fixes" even if they might break something else. In contrast, the new stuff I really don't know if it should go in at all, considering that it's trivial (and CPU-efficient) to build up legal bio request on the fly and _not_ depend on splitting them later (or at least making splitting a special thing only used by things like MD and other such indirection layers). Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-04 16:36 ` Linus Torvalds @ 2002-09-05 7:03 ` Suparna Bhattacharya 2002-09-05 15:06 ` Linus Torvalds 0 siblings, 1 reply; 30+ messages in thread From: Suparna Bhattacharya @ 2002-09-05 7:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jens Axboe, linux-kernel On Wed, Sep 04, 2002 at 09:36:42AM -0700, Linus Torvalds wrote: > > On Wed, 4 Sep 2002, Suparna Bhattacharya wrote: > > > > Oh yes, even I had this fixed this in the bio traversal patches > > I had posted (had this in the core patch, and mentioned it > > in the description in the note :) ), guess it went unnoticed. > > Well, I've never seen a "this should go in" about it. > > Also, it was apparently mixed up with the "bio splitup" stuff, which was True, that fix ought to have gone in as a separate patch. As Jens said, it was like we all knew it had to be fixed, but maybe no one quite saw the urgency till the case that used it came about. Barthlomeij actually raised another point about the situation when the partial completion happens with an error (i.e. not uptodate case), a situation which doesn't automatically get handled correctly. Something to think about ? > discussed at least with Jens, and I feel strongly that we shouldn't split, > we should build up. Jens was working on exactly that. > > In other words, I absolutely hate the fact that a major bug-fix was > (a) not marked as such and sent to me > and > (b) mixed up with experimental work for other drivers I agree. The fix was a matter on the side. I'd expected it to have gone in otherwise anyway, since it wasn't something new I'd discovered. It was perhaps just so obvious that it didn't happen so far ! I couldn't resist bringing it up in this context, because I really wanted some critical feedback about whether the approach seemed right, or whether I should just let the patch lie by the side till someone found a use for it, without having feeling guilty about having abandoned it and not following it through enough. At least at Ottawa it sounded like this was something people were interested in, and that was the whole reason for my doing in the first place. (Will get to the bio splitting point later down the line). After all, getting all drivers fixed up to adhere to this would require some work, something I know I can't do on my own (I just don't understand the subtleties that someone more familiar which them would), and which won't be right to cause others to handle unless people felt it was the correct and useful thing to do. I don't even mind dropping it if it seems like a "not now" or "not right" thing (I already seem to spent more effort on this then I probably should have), but at least after a bit of discussion to take it to that point. > > Even now (assuming I hadn't fixed it on my own), I would have preferred to > get that fix separately, as it would have impacted the floppy driver, for > example (the fix broke the floppy driver even more than it was before, > because the floppy driver was stupidly trying to work around the original > bug by hand). > > Imagine what a horror it is to figure out why a large experimental patch > breaks an existing driver? My first reaction would have been to just throw > the large new patch away, since it obviously broke the floppy even more. > Instead, if I had been passed the bug-fix only, and the floppy had broken > worse that it was originally, it would have been absolutely _obvious_ > where the problem was. > > In short: please please PLEASE keep fixes to existing code separate from > new stuff. It makes everything easier, and I have absolutely no problems > with applying "obvious fixes" even if they might break something else. OK. > > In contrast, the new stuff I really don't know if it should go in at all, > considering that it's trivial (and CPU-efficient) to build up legal bio > request on the fly and _not_ depend on splitting them later (or at least > making splitting a special thing only used by things like MD and other > such indirection layers). This is exactly the kind of feedback or discussion I need. The patch actually addresses a set of problems and not just bio splitting. Ideally I'd have liked to separately handle different problems differently in separate patches. In fact that's the way I had started out at first, only to realise that things become much simpler or cleaner to implement, if we treat this as a whole (excluding the bug fix above) as a basic traversal logic change. The problems are like this: Today partial segment completion of a bio affects the bv_offset and bv_len fields in the vecs themselves, because there is nothing in the bio to remember how far we are inside a current segment (and as deriving that by calculating it backwards from bi_size and bi_vcnt could be inefficient). This means that a higher layer which has passed a vector to bio can't assume that the vec entries would be valid after the i/o is completed. So though we have a uniform vec abstraction, passing it around subsystems seamlessly may throw up some unexpected results. (Just that are likely not to hit this often, which is why we don't hear complaints about it) Today, it is hard to handle partial submission of a request/bio where partial submission state has to be remembered across function invokations (or interrupts) without making a copy of the request _and_ its corresponding bios (together with vectors), since the block layer traversal functions combine traversal and completion without a clear concept of separation between submission and completion state within a request. IDE mult-count PIO is of course a case in point. Not sure if there are others. Today a bio cannot be split in the middle of a segment, which means that MD or other such indirection layers need to do their own bvec allocations in such cases. Once we have bio with has a different start offset from the first bvec's start, we lose some simplicity (which is why I'm not sure if all drivers necessarily need to support such bio s the first time around). But with such a field in the bio (bi_voffset), we can use that as a moving counter during traversal. (BTW, an extra field may not have been indispensable if we didn't need to separate submission and completion state, e.g. rq->current_nr_sectors might work for just this purpose as well, and that was what I was planning to try before Jens suggested going ahead with bi_voffset, but then found it handy to manage completion state separately from submission state). BTW, I agree splitting truly becomes an issue mainly in the case of such layered drivers, in other cases it is a good idea to build before hand; but for indirection drivers it _is_ an issue ... one can't expect to predict all situations and that would limit flexibility there. Another point that I've always wanted to bring up is that the moment we get to a need to divide an i/o into separate requests either to a single or multiple drivers, we _are_ doing an allocation of the request struct (using a slot in the pool of available requests for that queue). Building the right bio sizes doesn't do away with this, does it ? In any case, special case problem situations can always be dealt with special case code, but since we've redesigned the block i/o layer the question is whether we've got our abstractions right to deal with various requirements at the appropriate level and avoid workarounds by drivers or callers into block. So rather than look at the above problems individually and frequency of situations where we can imagine they'd occur in practice, could you think about the basics a bit and let me know what you think ? (a) Not the right away to go at all ? (b) Think about it later, not now ? (c) Try and get more pieces working correctly with this and then see ? One concern that I have with the patches I posted (I did split them up last time around, so they look smaller) is the increase in number of counters to update / track. This also has the effect of requiring updates happen via helpers rather than drivers individually managing these counters which is not necessarily a bad thing in itself, but I have to say that I was a trifle worried about potential unexpected side effects in the short run for drivers that make their own assumptions about these. I was hoping that driver maintainers would be able to help sound an alert for any issues they see right away. Regards Suparna > > Linus > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 7:03 ` Suparna Bhattacharya @ 2002-09-05 15:06 ` Linus Torvalds 2002-09-05 16:26 ` Andrew Morton 0 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2002-09-05 15:06 UTC (permalink / raw) To: Suparna Bhattacharya; +Cc: Jens Axboe, linux-kernel, Andrew Morton [ Suparna, I'll only react to one thing in your email right now, I'll try to take the time to look at your _real_ questions later after I've had my coffee and woken up.. ] On Thu, 5 Sep 2002, Suparna Bhattacharya wrote: > > Barthlomeij actually raised another point about the situation when the > partial completion happens with an error (i.e. not uptodate case), a situation > which doesn't automatically get handled correctly. Something to think > about ? Good point. Yes. However, it's hard to pass the errors down, since we've largely lost the individual parts of the bio (ie people don't even use the buffer_heads any more. There's a somewhat related issue with bio's, namely that partial _successful_ completion also doesn't notify anybody, which can suck from a latency standpoint if there are big delays in the partial IO (even if it is all successful). That's certainly true of the floppy driver, for example, where we can build up a 64kB bio due to read-ahead and then it takes many milliseconds between partial completions (which are done one track at a time in the absolute best case). Note that this actually makes read-ahead much less effective, because it means that we're not doing work while the read-ahead is happening: the read-ahead improves throughput by doing big blocks at a time, but it does _not_ get the improvement that it used to get of having asynchronous IO going on. We should wake up the person that maybe only needed the first page. But right now, we've kind of lost that ability, because the bio itself does not contain any such information. We used to have a list of buffer heads in the request, and could wake them up one by one, but.. I would suggest: - add a "nr of sectors completed" argument to the "bi_end_io()" function, so that it looks like void xxx_bio_end_io(struct bio *bio, unsigned long completed) { /* * Old completion handlers that don't understand it * should just return immediately for partial bio * completion notifiers */ if (bio->b_size) return; ... } which would allow things like mpage_end_io_read() to unlock pages as they complete, instead of unlocking them all in one go. Comments? It looks trivial to do this from a bio level, and it would not be hard to update the existing end_io functions (because you can always just update them with the one-liner "if not totally done, return" to get the old behaviour). Andrew? I really dislike the lack of concurrency in our current mpage read-ahead thing. Whaddayathink? Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 15:06 ` Linus Torvalds @ 2002-09-05 16:26 ` Andrew Morton 2002-09-05 17:05 ` Linus Torvalds 0 siblings, 1 reply; 30+ messages in thread From: Andrew Morton @ 2002-09-05 16:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel Linus Torvalds wrote: > > ... > I would suggest: > > - add a "nr of sectors completed" argument to the "bi_end_io()" function, > so that it looks like > > void xxx_bio_end_io(struct bio *bio, unsigned long completed) > { > /* > * Old completion handlers that don't understand it > * should just return immediately for partial bio > * completion notifiers > */ > if (bio->b_size) > return; > ... > } > > which would allow things like mpage_end_io_read() to unlock pages as > they complete, instead of unlocking them all in one go. It's a feature! We don't want to have to soak up 20,000 context switches a second just reading a file from an 80MB/sec disk. > Comments? It looks trivial to do this from a bio level, and it would not > be hard to update the existing end_io functions (because you can always > just update them with the one-liner "if not totally done, return" to get > the old behaviour). > > Andrew? I really dislike the lack of concurrency in our current mpage > read-ahead thing. Whaddayathink? Well it is supposed to lay out two BIOs, but if that happens, it's fragile - it relies on BIO_MAX_SIZE=64k and default readahead=128k. What I think we need to do here is to get Jens' bio_add_page() stuff up and running and to then go through the device drivers and set their max BIO size to something which is inversely proportional to the device's expected bandwidth. This way, the floppy readahead would lay out 1- or 2-page BIOs, and the honking FC array would lay out 128k or larger BIOs. (In fact, I would prefer that the device's nominal read- and write- bandwidths be made available in some manner. This way the VM can make these granularity-size decisions for readahead, and the VM can then solve the machine-full-of-dirty-memory-against-a-slow-device problem. But the non-blocking page reclaim code probably solves that too). ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 16:26 ` Andrew Morton @ 2002-09-05 17:05 ` Linus Torvalds 2002-09-05 18:00 ` Andrew Morton 0 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2002-09-05 17:05 UTC (permalink / raw) To: Andrew Morton; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel On Thu, 5 Sep 2002, Andrew Morton wrote: > Linus Torvalds wrote: > > > > ... > > I would suggest: > > > > - add a "nr of sectors completed" argument to the "bi_end_io()" function, > > so that it looks like > > > > void xxx_bio_end_io(struct bio *bio, unsigned long completed) > > { > > /* > > * Old completion handlers that don't understand it > > * should just return immediately for partial bio > > * completion notifiers > > */ > > if (bio->b_size) > > return; > > ... > > } > > > > which would allow things like mpage_end_io_read() to unlock pages as > > they complete, instead of unlocking them all in one go. > > It's a feature! We don't want to have to soak up 20,000 context > switches a second just reading a file from an 80MB/sec disk. You didn't think it through. The current behaviour is a BUG. A fast disk driver will _never ever_ do a partial request completion. A high-performance subsystem will put in the scatter-gather list and say "go" to the controller, and the controller will send exactly one interrupt back when it is all done. So for such a system, you'd never see partial completions anyway. Partial completions are a feature of slow hardware. And slow hardware is exactly when we want to know about it. So my approach has no downsides, only upsides. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 17:05 ` Linus Torvalds @ 2002-09-05 18:00 ` Andrew Morton 2002-09-05 18:28 ` Linus Torvalds 0 siblings, 1 reply; 30+ messages in thread From: Andrew Morton @ 2002-09-05 18:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel Linus Torvalds wrote: > > ... > A fast disk driver will _never ever_ do a partial request completion. A > high-performance subsystem will put in the scatter-gather list and say > "go" to the controller, and the controller will send exactly one interrupt > back when it is all done. OK. But still, I don't see why we need partial BIO completions. If we say that the basic unit of completion is a whole BIO, then readahead can then manage latency via the outgoing bio size. > So for such a system, you'd never see partial completions anyway. > > Partial completions are a feature of slow hardware. And slow hardware is > exactly when we want to know about it. Well I'd be interested in knowing specifically what is wrong with the behaviour of 2.5.33 against a floppy disk. In the testing I did a few weeks back, everything checked out. An application which was reading the raw device at 95% of media bandwidth never blocked. An application which was capable of processing data at 120% of media bandwidth achieved 100%. It could be that the initial 64k read at the start-of-file is too big, and the many-small-file behaviour is poor? A specific "this sucks" testcase would be helpful... ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 18:00 ` Andrew Morton @ 2002-09-05 18:28 ` Linus Torvalds 2002-09-05 18:31 ` Jens Axboe 2002-09-05 18:42 ` Andrew Morton 0 siblings, 2 replies; 30+ messages in thread From: Linus Torvalds @ 2002-09-05 18:28 UTC (permalink / raw) To: Andrew Morton; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel On Thu, 5 Sep 2002, Andrew Morton wrote: > > OK. But still, I don't see why we need partial BIO completions. If > we say that the basic unit of completion is a whole BIO, then readahead > can then manage latency via the outgoing bio size. But that's horrible. The floppy driver can take huge bio's no problem, and limiting bio sizes to track sizes would be a huge pain in the driver for no good reason. In fact, it would be pretty much impossible, since the tracks aren't even page-aligned. So limiting bio's fundamentally _cannot_ do the right thing. While adding two lines of code _can_. > Well I'd be interested in knowing specifically what is wrong with the > behaviour of 2.5.33 against a floppy disk. Try it (not plain 2.5.33, but current BK where floppy actually works). It works, but reading a single sector will pause until it has read several tracks. Even though the sector came back much earlier. Also, you missed the error case argument. Right now we _cannot_ handle error cases for partial transfers AT ALL. The bio interface simply does not support it. And there is no way you can fix that with the current interface. While the partial completion case allows for at least partial recovery. Andrew, give it up. The current interface _sucks_. > In the testing I did a few weeks back, everything checked out. An > application which was reading the raw device at 95% of media bandwidth > never blocked. An application which was capable of processing data at > 120% of media bandwidth achieved 100%. And an application that only wanted one sector? To notice that the filesystem is of the wrong type, for example? Throughput is _secondary_ to many latency concerns. And you cannot fix the latency with full bio's (see the track issue). Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 18:28 ` Linus Torvalds @ 2002-09-05 18:31 ` Jens Axboe 2002-09-05 18:38 ` Linus Torvalds 2002-09-05 18:42 ` Andrew Morton 1 sibling, 1 reply; 30+ messages in thread From: Jens Axboe @ 2002-09-05 18:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, Suparna Bhattacharya, linux-kernel On Thu, Sep 05 2002, Linus Torvalds wrote: > > OK. But still, I don't see why we need partial BIO completions. If > > we say that the basic unit of completion is a whole BIO, then readahead > > can then manage latency via the outgoing bio size. > > But that's horrible. The floppy driver can take huge bio's no problem, and > limiting bio sizes to track sizes would be a huge pain in the driver for > no good reason. In fact, it would be pretty much impossible, since the > tracks aren't even page-aligned. > > So limiting bio's fundamentally _cannot_ do the right thing. While adding > two lines of code _can_. I agree that partial completions are the right thing to do here, and in fact this is how the interface was originally remember? However, I don't see how this is a two-liner change. Basically you are changing bi_end_io() from a completion to partial completion invokation, which requires changing (and complicating) all of them. Just adding a sector count to bio_endio() does not enable that to partially complete some pages. What am I missing? Jens ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 18:31 ` Jens Axboe @ 2002-09-05 18:38 ` Linus Torvalds 2002-09-05 18:38 ` Jens Axboe 0 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2002-09-05 18:38 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Morton, Suparna Bhattacharya, linux-kernel On Thu, 5 Sep 2002, Jens Axboe wrote: > > However, I don't see how this is a two-liner change. Basically you are > changing bi_end_io() from a completion to partial completion invokation, Right. So we'll add one bio->bi_end_io(bio, nr_sectors) to the partial bio completion case in "finish_that_request_first()". That's one line. The other line is if (bio->bi_size) return; which has to be added to the end-request thing. (yeah, yeah, there are several end-request functions, and we need to fix up the function prototypes too etc, but the point is that the actual _work_ is just two real lines of new code). Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 18:38 ` Linus Torvalds @ 2002-09-05 18:38 ` Jens Axboe 2002-09-05 19:47 ` Peter Osterlund 0 siblings, 1 reply; 30+ messages in thread From: Jens Axboe @ 2002-09-05 18:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, Suparna Bhattacharya, linux-kernel On Thu, Sep 05 2002, Linus Torvalds wrote: > > On Thu, 5 Sep 2002, Jens Axboe wrote: > > > > However, I don't see how this is a two-liner change. Basically you are > > changing bi_end_io() from a completion to partial completion invokation, > > Right. So we'll add one > > bio->bi_end_io(bio, nr_sectors) > > to the partial bio completion case in "finish_that_request_first()". > > That's one line. > > The other line is > > if (bio->bi_size) return; > > which has to be added to the end-request thing. > > (yeah, yeah, there are several end-request functions, and we need to fix > up the function prototypes too etc, but the point is that the actual > _work_ is just two real lines of new code). Ah ok, then we completely agree on what needs doing :-) And yes, this is 100% identical to the earlier bio code (I forget what revisions, and that was prior to BK I think). Jens ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 18:38 ` Jens Axboe @ 2002-09-05 19:47 ` Peter Osterlund 0 siblings, 0 replies; 30+ messages in thread From: Peter Osterlund @ 2002-09-05 19:47 UTC (permalink / raw) To: Jens Axboe Cc: Linus Torvalds, Andrew Morton, Suparna Bhattacharya, linux-kernel Jens Axboe <axboe@suse.de> writes: > And yes, this is 100% identical to the earlier bio code (I forget what > revisions, and that was prior to BK I think). It changed in 2.5.5-pre1: <axboe@burns.home.kernel.dk> (02/02/11 1.262.3.11) Remove nr_sectors from bio_end_io end I/O callback. It was a relic from when completion was potentially called more than once to indicate partial end I/O. These days bio->bi_end_io is _only_ called when I/O has completed on the entire bio. -- Peter Osterlund - petero2@telia.com http://w1.894.telia.com/~u89404340 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 18:28 ` Linus Torvalds 2002-09-05 18:31 ` Jens Axboe @ 2002-09-05 18:42 ` Andrew Morton 2002-09-05 19:03 ` Linus Torvalds 2002-09-07 11:18 ` Daniel Phillips 1 sibling, 2 replies; 30+ messages in thread From: Andrew Morton @ 2002-09-05 18:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel Linus Torvalds wrote: > > On Thu, 5 Sep 2002, Andrew Morton wrote: > > > > OK. But still, I don't see why we need partial BIO completions. If > > we say that the basic unit of completion is a whole BIO, then readahead > > can then manage latency via the outgoing bio size. > > But that's horrible. The floppy driver can take huge bio's no problem, and > limiting bio sizes to track sizes would be a huge pain in the driver for > no good reason. In fact, it would be pretty much impossible, since the > tracks aren't even page-aligned. erp. Yes, a BIO can span several tracks. I see the point. > ... > > In the testing I did a few weeks back, everything checked out. An > > application which was reading the raw device at 95% of media bandwidth > > never blocked. An application which was capable of processing data at > > 120% of media bandwidth achieved 100%. > > And an application that only wanted one sector? To notice that the > filesystem is of the wrong type, for example? OK. That's the initial-64k thing. If you read 512 bytes from /dev/fd0, readahead will go and issue a 64k request, and your 512-byte request takes ages. > Throughput is _secondary_ to many latency concerns. And you cannot fix the > latency with full bio's (see the track issue). The `nr_of_sectors_completed' would be tricky to handle - we don't know how to bring the pagecache pages uptodate in response to a 512-byte completion. We'd have to teach the pagecache read functions to permit userspace to read from non-uptodate pages by looking at the buffer_head state. And then look at buffer_req to distinguish "this part of the page hasn't been read yet" from "this part of the page had an IO error". Ick. It would be simpler if it was nr_of_pages_completed. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 18:42 ` Andrew Morton @ 2002-09-05 19:03 ` Linus Torvalds 2002-09-05 19:35 ` Andrew Morton 2002-09-10 7:25 ` Rogier Wolff 2002-09-07 11:18 ` Daniel Phillips 1 sibling, 2 replies; 30+ messages in thread From: Linus Torvalds @ 2002-09-05 19:03 UTC (permalink / raw) To: Andrew Morton; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel On Thu, 5 Sep 2002, Andrew Morton wrote: > > It would be simpler if it was nr_of_pages_completed. Well.. Maybe. I actually think that you in practice really only have two cases: - we only care about full completion. Easily tested for by just looking at bi_size, and leaving the code as it is right now. - the code really cares about and uses the incremental information. At which point it will already have "handled" any previous incremental stuff, and the only thing it really cares about is the "new increment". So I'd suggest making at least the first cut only have the incremental information, since the global information already exists through bi_size. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 19:03 ` Linus Torvalds @ 2002-09-05 19:35 ` Andrew Morton 2002-09-05 20:27 ` Linus Torvalds 2002-09-05 20:48 ` Linus Torvalds 2002-09-10 7:25 ` Rogier Wolff 1 sibling, 2 replies; 30+ messages in thread From: Andrew Morton @ 2002-09-05 19:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel Linus Torvalds wrote: > > On Thu, 5 Sep 2002, Andrew Morton wrote: > > > > It would be simpler if it was nr_of_pages_completed. > > Well.. Maybe. > > I actually think that you in practice really only have two cases: > > - we only care about full completion. Easily tested for by just looking > at bi_size, and leaving the code as it is right now. OK. > - the code really cares about and uses the incremental information. At > which point it will already have "handled" any previous incremental > stuff, and the only thing it really cares about is the "new increment". So the BIO client would need to keep some state somewhere about "this is the next page to unlock". That would best be in the BIO somewhere. (I assume the block layer handles the case where the hardware signals completion in non-sector-ascending-order? That must have been fun to code and test). > So I'd suggest making at least the first cut only have the incremental > information, since the global information already exists through bi_size. Well we still will have the problem where reading 512 bytes from /dev/fd0 does 64k of IO. That is most sweet for reading a bunch of 32k ext2 files from a hard drive, not so nice for reading fd0's partition table. We can do all sorts of things by dropping parameters, hints and tunables into q->backing_dev_info. We just need to work out what is sensible for this. ummm. I guess backing_dev_info.read_k_per_sec would suffice. Or .i_am_a_floppy ;) ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 19:35 ` Andrew Morton @ 2002-09-05 20:27 ` Linus Torvalds 2002-09-05 20:38 ` Linus Torvalds 2002-09-06 6:47 ` Helge Hafting 2002-09-05 20:48 ` Linus Torvalds 1 sibling, 2 replies; 30+ messages in thread From: Linus Torvalds @ 2002-09-05 20:27 UTC (permalink / raw) To: Andrew Morton; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel On Thu, 5 Sep 2002, Andrew Morton wrote: > > > - the code really cares about and uses the incremental information. At > > which point it will already have "handled" any previous incremental > > stuff, and the only thing it really cares about is the "new increment". > > So the BIO client would need to keep some state somewhere about "this is > the next page to unlock". That would best be in the BIO somewhere. Well, the information actually is there already, although I have to admit that it's kind of subtle. Look at the current mpage_end_io_read(), and realize that it already does a traversal in pages from start = bio->bi_io_vec end = bio->bi_io_vec + bio->bi_vcnt - 1 which is actually very close to what it would do with partial results. With partial results, it would need to do only a slightly different traversal: end = bio->bi_io_vec + bio->bi_vcnt*PAGE_SIZE - bio->bi_size start = end - nr_sectors * 512 PAGE_ALIGN(start) PAGE_ALIGN(end) but it's otherwise the exact same code (doing all the edge calculations in bytes, and then only traversing pages that have now been fully done and weren't fully done last time). It _looks_ like it literally needs just a few lines of changes. So I would actually argue against adding new information: we _do_ actually have the information already, and adding more "convenient" forms of it adds more aliasing and coherency issues. The current form isn't _that_ complicated to use, and we might hide it behind a simple macro: #define GET_PAGE_INDEXES(bio, start, end) \ ... set start/end to point into the ... ... proper bio->bi_io_vec subarray ... > Well we still will have the problem where reading 512 bytes from /dev/fd0 > does 64k of IO. That is most sweet for reading a bunch of 32k ext2 files > from a hard drive, not so nice for reading fd0's partition table. I do think we might make the read-ahead window configurable, and make slow devices have slightly smaller windows. On the other hand, I don't think the 64kB IO actually _hurts_ per se, as long as it doesn't delay the stuff we care about. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 20:27 ` Linus Torvalds @ 2002-09-05 20:38 ` Linus Torvalds 2002-09-06 6:47 ` Helge Hafting 1 sibling, 0 replies; 30+ messages in thread From: Linus Torvalds @ 2002-09-05 20:38 UTC (permalink / raw) To: Andrew Morton; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel On Thu, 5 Sep 2002, Linus Torvalds wrote: > > With partial results, it would need to do only a slightly different > traversal: > > end = bio->bi_io_vec + bio->bi_vcnt*PAGE_SIZE - bio->bi_size > > start = end - nr_sectors * 512 > > PAGE_ALIGN(start) > PAGE_ALIGN(end) > > but it's otherwise the exact same code (doing all the edge calculations in > bytes, and then only traversing pages that have now been fully done and > weren't fully done last time). > > It _looks_ like it literally needs just a few lines of changes. Ok, so we now have two "few lines of code" changes. Who wants to actually _do_ these? I'll do it if nobody else wants to, but I'd much rather see somebody else _do_ want to do this and test it out and just send me a tested patch ;) Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 20:27 ` Linus Torvalds 2002-09-05 20:38 ` Linus Torvalds @ 2002-09-06 6:47 ` Helge Hafting 2002-09-06 6:59 ` Linus Torvalds 1 sibling, 1 reply; 30+ messages in thread From: Helge Hafting @ 2002-09-06 6:47 UTC (permalink / raw) To: Linus Torvalds, linux-kernel Linus Torvalds wrote: [...] > I do think we might make the read-ahead window configurable, and make slow > devices have slightly smaller windows. > > On the other hand, I don't think the 64kB IO actually _hurts_ per se, as > long as it doesn't delay the stuff we care about. I can think of one case where large readahead hurts for floppy, even with partial completion: 1. Grab a stack of floppies 2. Try mounting (or mount+ls) one after another, in search of the right one. You'll get the results on screen fast enough (mount succeeded/failed or ls showed the right/wrong files) but when it is the wrong floppy you have to wait for several tracks to read before you may eject and try the next one. Sure, it is only reading so ejecting "by force" won't hurt the fs but then you have to wait on io errors instead. So I think a smaller readahead might make sense for floppies, unless people don't do this sort of search anymore. I don't. Helge Hafting ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-06 6:47 ` Helge Hafting @ 2002-09-06 6:59 ` Linus Torvalds 2002-09-09 14:08 ` Bob_Tracy 0 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2002-09-06 6:59 UTC (permalink / raw) To: Helge Hafting; +Cc: linux-kernel On Fri, 6 Sep 2002, Helge Hafting wrote: > > I can think of one case where large readahead hurts for floppy, even > with partial completion: > > 1. Grab a stack of floppies > 2. Try mounting (or mount+ls) one after another, > in search of the right one. Note that the delay for motor on/off is _much_ larger than the actual delay for seeking. The seek itself is on the order of a few ms, with the head settle time being in the tens (possibly even a few hundred) ms per track. So assuming you end up reading 4 tracks or so due to readahead, that's still in the range of about one second. In contrast, the motor on/off time is something like 5 seconds if I remember correctly. Of course, you can certainly eject the floppy while the motor is still running, but I'd suggest against it. > So I think a smaller readahead might make sense for floppies, > unless people don't do this sort of search anymore. I don't. I do agree that a 64kB readahead is likely to be excessive on a floppy, I'm just saying that I doubt it will be all that noticeable in most cases. The absolute worst case is when opening, reading a sector, and closing again several times in succession, at which point right now we'll end up serializing. But even at 64kB, that's going to be faster than most people can change floppies if they actually want to even glance at what the contents are. The reason I want the first sectors to be returned early is that I thought it was quite noticeable to do just a simple "ls" on the floppy when I tested. Of course, that may be just me: it's literally been several years since I really used floppies, and maybe they really always were that slow. But I thought the root directory was on track zero, so it _should_ return it first thing. Oh, well. I don't seem to be the only one who doesn't use the dang things any more. The floppy driver has been broken in 2.5.x for half a year or whatever, and there weren't _that_ many people who ever even mentioned it. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-06 6:59 ` Linus Torvalds @ 2002-09-09 14:08 ` Bob_Tracy 0 siblings, 0 replies; 30+ messages in thread From: Bob_Tracy @ 2002-09-09 14:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: Helge Hafting, linux-kernel Linus Torvalds wrote: > Oh, well. I don't seem to be the only one who doesn't use the dang things > any more. The floppy driver has been broken in 2.5.x for half a year or > whatever, and there weren't _that_ many people who ever even mentioned it. Oh, we're out here alright... I didn't speak up (until now) for several reasons: (a) development kernel == things break. (b) when things break, people squawk -- I hardly expected that I'd need to add my voice to the "me too" chorus (bad form, don't you know?). (c) broken things tend to get fixed quickly, which is another way of saying I have faith in the developers' abilities :-). Anyway, floppy support is still reasonably important to me. I've got a few legacy systems that cannot boot from from CD-ROM, and a 3.5" floppy is still handy for small sneaker-net transfers between non-networked machines. Thanks for reading... -- ----------------------------------------------------------------------- Bob Tracy WTO + WIPO = DMCA? http://www.anti-dmca.org rct@frus.com ----------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 19:35 ` Andrew Morton 2002-09-05 20:27 ` Linus Torvalds @ 2002-09-05 20:48 ` Linus Torvalds 1 sibling, 0 replies; 30+ messages in thread From: Linus Torvalds @ 2002-09-05 20:48 UTC (permalink / raw) To: Andrew Morton; +Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel Btw, the update to do partial completion will need a few more fixes: right now the different callers of "bio->bi_end_io(bio)" are not very careful about updating the bio information, since no bi_end_io() function has had any reason to care before. That turns the 2-liner patch into slightly more, since for example the failure cases in __make_request() need to make sure that they pass in the right size/sector count. So the .... end_io: bio->bi_end_io(bio); return 0; would become something like .... end_io: bio->bi_size = 0; bio->bi_end_io(bio, nr_sectors); return 0; if we had this interface. To avoid those kinds of silly bugs and to avoid havind the bi_end_io() function have to look up all the bio information, maybe the end_io calling convention really should be void bio_end_io(struct bio *bio, unsigned int completed, unsigned int left, unsigned int uptodate); and then a failure would just be bio->bi_end_io(bio, nr_sectors, 0, 0); and the end-io function would have all the information it needs to decide how much has been done / is left undone directly in the arguments. One final question would be whether we would want to make all of these byte counts, for some future networked block device where we might be getting the completions back in odd-sized chunks, for example? Right now much of the bio code already is able to handle byte-sized stuff, and it might be nice to not have to maintain byte counts in NBD if the bio layer already does it anyway.. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 19:03 ` Linus Torvalds 2002-09-05 19:35 ` Andrew Morton @ 2002-09-10 7:25 ` Rogier Wolff 2002-09-10 8:01 ` Andrew Morton 1 sibling, 1 reply; 30+ messages in thread From: Rogier Wolff @ 2002-09-10 7:25 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Suparna Bhattacharya, Jens Axboe, linux-kernel On Thu, Sep 05, 2002 at 12:03:30PM -0700, Linus Torvalds wrote: > > On Thu, 5 Sep 2002, Andrew Morton wrote: > > > > It would be simpler if it was nr_of_pages_completed. > > Well.. Maybe. Ehmm. I'm in the data-recovery business, and we seem to have lost the ability to recover the other 3k of a 4k page if one of the blocks is bad. And we're annoyed about the read-ahead trying to read blocks past a bad block without returning to the application. Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 ** *-- BitWizard writes Linux device drivers for any device you may have! --* * The Worlds Ecosystem is a stable system. Stable systems may experience * * excursions from the stable situation. We are currenly in such an * * excursion: The stable situation does not include humans. *************** ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-10 7:25 ` Rogier Wolff @ 2002-09-10 8:01 ` Andrew Morton 0 siblings, 0 replies; 30+ messages in thread From: Andrew Morton @ 2002-09-10 8:01 UTC (permalink / raw) To: Rogier Wolff Cc: Linus Torvalds, Andrew Morton, Suparna Bhattacharya, Jens Axboe, linux-kernel Rogier Wolff wrote: > > ... > > Ehmm. I'm in the data-recovery business, and we seem to have lost > the ability to recover the other 3k of a 4k page if one of the blocks > is bad. > > And we're annoyed about the read-ahead trying to read blocks past > a bad block without returning to the application. > You can use the raw driver, or O_DIRECT against /dev/hdXX. That will give 512-byte granularity and no readahead. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-05 18:42 ` Andrew Morton 2002-09-05 19:03 ` Linus Torvalds @ 2002-09-07 11:18 ` Daniel Phillips 1 sibling, 0 replies; 30+ messages in thread From: Daniel Phillips @ 2002-09-07 11:18 UTC (permalink / raw) To: Andrew Morton, Linus Torvalds Cc: Suparna Bhattacharya, Jens Axboe, linux-kernel On Thursday 05 September 2002 20:42, Andrew Morton wrote: > The `nr_of_sectors_completed' would be tricky to handle - we don't know how > to bring the pagecache pages uptodate in response to a 512-byte completion. > We'd have to teach the pagecache read functions to permit userspace to read > from non-uptodate pages by looking at the buffer_head state. And then > look at buffer_req to distinguish "this part of the page hasn't been read yet" > from "this part of the page had an IO error". Ick. It's yet another reason to have subpages, and see, keeping state at the right granularity really does matter. I'm just adding items to the list of reasons why we need it, so when the time comes to do it, I won't have to waste a lot of energy explaining why. -- Daniel ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-03 18:00 One more bio for for floppy users in 2.5.33 Linus Torvalds 2002-09-03 18:02 ` Jens Axboe @ 2002-09-03 20:57 ` Mikael Pettersson 2002-09-03 21:52 ` Linus Torvalds 1 sibling, 1 reply; 30+ messages in thread From: Mikael Pettersson @ 2002-09-03 20:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List, Jens Axboe Linus Torvalds writes: > > Ok, > I found another major bio-related bug that definitely explains why the > floppy driver generated corruption - any partial request completion would > be totally messed up by the BIO layer (not the floppy driver). > > Any other block device driver that did partial request completion might > also be impacted. > > I'm still looking at the floppy driver itself - some of the request > completion code is so messed up as to be unreadable, and some of that may > actually be due to trying to work around the bio problem (unsuccessfully, > I may add). So this may not actually fix things for you yet, simply > because the floppy driver itself still does some strange things. Confirmed. 2.5.33 + your two patches still corrupts data on a simple dd to then from /dev/fd0 test. /Mikael ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-03 20:57 ` Mikael Pettersson @ 2002-09-03 21:52 ` Linus Torvalds 2002-09-03 22:33 ` Mikael Pettersson 0 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2002-09-03 21:52 UTC (permalink / raw) To: Mikael Pettersson; +Cc: Kernel Mailing List On Tue, 3 Sep 2002, Mikael Pettersson wrote: > > Confirmed. 2.5.33 + your two patches still corrupts data on a simple > dd to then from /dev/fd0 test. Ok, if you don't have BK, then here's the floppy driver end_request() cleanup as a plain patch. This passes dd tests for me, but they were by no means very exhaustive. Linus --- # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 02/09/03 torvalds@home.transmeta.com 1.581.4.2 # Fix floppy driver end_request() handling - it used to do insane # contortions instead of just calling "end_that_request_first()" with # the proper sector count. # -------------------------------------------- # diff -Nru a/drivers/block/floppy.c b/drivers/block/floppy.c --- a/drivers/block/floppy.c Tue Sep 3 14:51:09 2002 +++ b/drivers/block/floppy.c Tue Sep 3 14:51:09 2002 @@ -2295,16 +2295,15 @@ { kdev_t dev = req->rq_dev; - if (end_that_request_first(req, uptodate, req->hard_cur_sectors)) + if (end_that_request_first(req, uptodate, current_count_sectors)) return; add_blkdev_randomness(major(dev)); floppy_off(DEVICE_NR(dev)); blkdev_dequeue_request(req); end_that_request_last(req); - /* Get the next request */ - req = elv_next_request(QUEUE); - CURRENT = req; + /* We're done with the request */ + CURRENT = NULL; } @@ -2335,27 +2334,8 @@ /* unlock chained buffers */ spin_lock_irqsave(q->queue_lock, flags); - while (current_count_sectors && CURRENT && - current_count_sectors >= req->current_nr_sectors){ - current_count_sectors -= req->current_nr_sectors; - req->nr_sectors -= req->current_nr_sectors; - req->sector += req->current_nr_sectors; - end_request(req, 1); - } + end_request(req, 1); spin_unlock_irqrestore(q->queue_lock, flags); - - if (current_count_sectors && CURRENT) { - /* "unlock" last subsector */ - req->buffer += current_count_sectors <<9; - req->current_nr_sectors -= current_count_sectors; - req->nr_sectors -= current_count_sectors; - req->sector += current_count_sectors; - return; - } - - if (current_count_sectors && !CURRENT) - DPRINT("request list destroyed in floppy request done\n"); - } else { if (rq_data_dir(req) == WRITE) { /* record write error information */ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33.. 2002-09-03 21:52 ` Linus Torvalds @ 2002-09-03 22:33 ` Mikael Pettersson 0 siblings, 0 replies; 30+ messages in thread From: Mikael Pettersson @ 2002-09-03 22:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mikael Pettersson, Kernel Mailing List Linus Torvalds writes: > > On Tue, 3 Sep 2002, Mikael Pettersson wrote: > > > > Confirmed. 2.5.33 + your two patches still corrupts data on a simple > > dd to then from /dev/fd0 test. > > Ok, if you don't have BK, then here's the floppy driver end_request() > cleanup as a plain patch. > > This passes dd tests for me, but they were by no means very exhaustive. Success! With this patch and your previous two the floppy driver passes all tests I've thrown at it, including raw data access, VFS access with ext2, installing lilo, fsck, and booting the result. Thanks. /Mikael ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: One more bio for for floppy users in 2.5.33..
@ 2002-09-07 14:43 linux
0 siblings, 0 replies; 30+ messages in thread
From: linux @ 2002-09-07 14:43 UTC (permalink / raw)
To: linux-kernel
On Fri, 6 Sep 2002, Linus Torvalds wrote:
> Note that the delay for motor on/off is _much_ larger than the actual
> delay for seeking.
>
> The seek itself is on the order of a few ms, with the head settle time
> being in the tens (possibly even a few hundred) ms per track. So assuming
> you end up reading 4 tracks or so due to readahead, that's still in the
> range of about one second.
>
> In contrast, the motor on/off time is something like 5 seconds if I
> remember correctly. Of course, you can certainly eject the floppy while
> the motor is still running, but I'd suggest against it.
You're forgetting the transfer rate. A 1440K floppy has 160
tracks (80 cylinders * 2 heads), or 9K per track.
It spins at 300 RPM, so it takes at least 200 ms to read that track.
45K/sec.
A 64K read spans 7.11 tracks, which will take 1422 ms to read.
Add 100 ms for initial rotational latency, and assume that subsequent
tracks are optimally arranged for continuous reads.
That's 1.5 secods just to transfer the data.
*Then* you can add all of the seek and motor spin-up/down times
mentioned above.
(Of course, if the floppy *isn't* formatted optimally, add an
extra 100 ms per seek, or 700 ms total, of rotatinal latency.)
^ permalink raw reply [flat|nested] 30+ messages in threadend of thread, other threads:[~2002-09-10 7:41 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-09-03 18:00 One more bio for for floppy users in 2.5.33 Linus Torvalds 2002-09-03 18:02 ` Jens Axboe 2002-09-04 7:25 ` Suparna Bhattacharya 2002-09-04 16:36 ` Linus Torvalds 2002-09-05 7:03 ` Suparna Bhattacharya 2002-09-05 15:06 ` Linus Torvalds 2002-09-05 16:26 ` Andrew Morton 2002-09-05 17:05 ` Linus Torvalds 2002-09-05 18:00 ` Andrew Morton 2002-09-05 18:28 ` Linus Torvalds 2002-09-05 18:31 ` Jens Axboe 2002-09-05 18:38 ` Linus Torvalds 2002-09-05 18:38 ` Jens Axboe 2002-09-05 19:47 ` Peter Osterlund 2002-09-05 18:42 ` Andrew Morton 2002-09-05 19:03 ` Linus Torvalds 2002-09-05 19:35 ` Andrew Morton 2002-09-05 20:27 ` Linus Torvalds 2002-09-05 20:38 ` Linus Torvalds 2002-09-06 6:47 ` Helge Hafting 2002-09-06 6:59 ` Linus Torvalds 2002-09-09 14:08 ` Bob_Tracy 2002-09-05 20:48 ` Linus Torvalds 2002-09-10 7:25 ` Rogier Wolff 2002-09-10 8:01 ` Andrew Morton 2002-09-07 11:18 ` Daniel Phillips 2002-09-03 20:57 ` Mikael Pettersson 2002-09-03 21:52 ` Linus Torvalds 2002-09-03 22:33 ` Mikael Pettersson -- strict thread matches above, loose matches on Subject: below -- 2002-09-07 14:43 linux
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox