* struct backing_dev - purpose and life time rules
@ 2010-07-27 9:01 Christoph Hellwig
2010-07-27 9:14 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2010-07-27 9:01 UTC (permalink / raw)
To: jaxboe, peterz, akpm, kay.sievers, viro; +Cc: linux-mm, linux-fsdevel
The struct backing_dev was introduced back in Linux 2.5 days by Andrew's
"[PATCH] pdflush exclusion infrastructure"
at which point it was still fairly simple, having a simple ra_pages
field, and a state with two flags. Lifetime rules at that point where
rather simple, too - we either had one embedded into the block queue,
or a default_backing_dev_info that was statically allocated. At little
later we got congestion information, per-bdi unplug support and the
complicated capabilies scheme we have now, but until 2007 things stay
very simple, and we don't have any interesting life time rules.
In 2007 Peter added per-cpu statistics counters to the BDI, which at
this point required bdi_init / bdi_destroy calls to guard the lifetime
of the BDI. Now we actively need to manage the life time, but it's
still pretty simple.
It starts to get interesting with
"mm: bdi: export BDI attributes in sysfs"
that Peter added in April 2008, which ties the previously simple
structure into the device model / sysfs life time rules. And at
least for the block device it does so rather badly given that it
tries to register the per-queue backing-device object during
add_disk, ignoring that we can have multiple gendisks for a
given request_queue.
Even worse it really mixes up the unregister vs destroy concepts
by simply calling bdi_unregister from bdi_destroy, which means we'll
easily get duplicate unregisters. It more or less protects by
checking bdi->dev (without synchronization) and doesn't do too
stupid mixups between unregister and destroy yet, so life isn't
that bad.
Commit
"bdi: register sysfs bdi device only once per queue"
from December 2008 tries to work around this by simply skipping
out of bdi_register if a device is already registered, which ignores
the fundamental problems with the issue. And also leaves the duplicate
removals in place.
Then last year in commit
"writeback: switch to per-bdi threads for flushing data"
The per-bdi flusher thread preparion and shutdown gets added to
bdi_register/unregister. Now that's where the next big pile of
issues start. If a disk is surprise removed that means that both
the flusher thread disappear and the newly added s_bdi pointer
in the superblock disappear. But we still have a life filesystem
that might reference s->bdi and we don't have any good thing
preventing it from doing it. If it had been done in destroy
we could skip all these efforts.
In the meantime we've also grown two names for the bdi: bdi->name,
which has a generic name, afaik only used for a single debug printk,
and the name for the sysfs device which we use in a couple other
places.
So what do we need to do to sort this mess out?
For one thing the bdi needs it's own life time rules, and if we want
to keep it in sysfs it needs to get a name independent of the block
device dev_t pointing to it. Just generalizing bdi object naming to
bdi->name + sequence number would fix this nicely while also
simplifying the code. The links from the disk device could remain
their current names, which should keep userspace looking at it working.
of the disks.
Second bdi_register/unregister would go away in their current form,
we'd always keep the bdi registered during it's life time, only the
links from a disk get added / removed by add_disk / unlink_gendisk
but that will be handled outside the bdi code (it already is). A bdi
will never go away while a superblock uses it. For the non block
based filesystems that's already true, but for block ones that means
we need to stop unliking it in unlink_gendisk.
One issue with this is that some of the "random" backing devices
are initialized too early to actually create the sysfs representation.
Maybe we should never bother to register bdis like the /dev/zero one
and not allow people to tweak the settings? The only thing that could
be tweaked would be the ra_pages number anyway, which doesn't make
sense for these non-writeable bdis.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: struct backing_dev - purpose and life time rules
2010-07-27 9:01 struct backing_dev - purpose and life time rules Christoph Hellwig
@ 2010-07-27 9:14 ` Christoph Hellwig
2010-07-27 13:39 ` Vivek Goyal
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2010-07-27 9:14 UTC (permalink / raw)
To: jaxboe, peterz, akpm, kay.sievers, viro, vgoyal; +Cc: linux-mm, linux-fsdevel
In addition to these gem's there's an even worse issue in blk cfq,
introduced in commit
"blkio: Export disk time and sectors used by a group to user space"
which parses the name inside the backing_dev sysfs device back into a
major / minor number. Given how obviously stupid this is, and given
the whack a mole blkiocg is I'm tempted to simply break it and see if
anyone cares.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: struct backing_dev - purpose and life time rules
2010-07-27 9:14 ` Christoph Hellwig
@ 2010-07-27 13:39 ` Vivek Goyal
2010-07-27 14:09 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Vivek Goyal @ 2010-07-27 13:39 UTC (permalink / raw)
To: Christoph Hellwig
Cc: jaxboe, peterz, akpm, kay.sievers, viro, linux-mm, linux-fsdevel
On Tue, Jul 27, 2010 at 11:14:59AM +0200, Christoph Hellwig wrote:
> In addition to these gem's there's an even worse issue in blk cfq,
> introduced in commit
>
> "blkio: Export disk time and sectors used by a group to user space"
>
> which parses the name inside the backing_dev sysfs device back into a
> major / minor number. Given how obviously stupid this is,
How can I do it better?
I needed a unique identifier with which user can work in terms of
specifying weights to devices and in terms of understanding what stats
mean. Device major/minor number looked like a obivious choice.
I was looking for how to determine what is the major/minor number of disk
request queue is associated with and I could use bdi to do that.
So I was working under the assumption that there is one request queue
associated with one gendisk and I can use major/minor number for that
disk to uniquely identify request queue.
But you seem to be suggesting that there can be multiple gendisk associated
with a single request queue. I am not sure how does that happen but if it
does, that means a single request queue has requests for multiple gendisks
hence for multiple major/minor number pairs?
If yes, then we need to come up with unique naming scheme for request queue
which CFQ can use to export stats to user space through cgroup interface
and also a user can use same name/indentifier to be able to specify per
device/request queue weigths.
> and given
> the whack a mole blkiocg is I'm tempted to simply break it and see if
> anyone cares.
I do care about blkiocg. Why do you think it is a mole? If things are
wrong, guide me how to go about fixing it and I will do that.
Thanks
Vivek
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: struct backing_dev - purpose and life time rules
2010-07-27 13:39 ` Vivek Goyal
@ 2010-07-27 14:09 ` Christoph Hellwig
2010-07-27 23:55 ` FUJITA Tomonori
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2010-07-27 14:09 UTC (permalink / raw)
To: Vivek Goyal
Cc: Christoph Hellwig, jaxboe, peterz, akpm, kay.sievers, viro,
linux-mm, linux-fsdevel
On Tue, Jul 27, 2010 at 09:39:56AM -0400, Vivek Goyal wrote:
> How can I do it better?
>
> I needed a unique identifier with which user can work in terms of
> specifying weights to devices and in terms of understanding what stats
> mean. Device major/minor number looked like a obivious choice.
>
> I was looking for how to determine what is the major/minor number of disk
> request queue is associated with and I could use bdi to do that.
The problem is that a queue can be shared between multiple gendisks,
so dev_t of a gendisk is not a unique identifier. In addition to that
we even have gendisks that do not even have a block device associated
with them (e.g. for scsi tapes) or request queues that do not have
any gendisks attached to it (e.g. scsi devices without an ULD like
various types of scanners or printers).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: struct backing_dev - purpose and life time rules
2010-07-27 14:09 ` Christoph Hellwig
@ 2010-07-27 23:55 ` FUJITA Tomonori
2010-07-28 4:42 ` Kay Sievers
0 siblings, 1 reply; 6+ messages in thread
From: FUJITA Tomonori @ 2010-07-27 23:55 UTC (permalink / raw)
To: hch
Cc: vgoyal, jaxboe, peterz, akpm, kay.sievers, viro, linux-mm,
linux-fsdevel
Not a comment on the original topic,
On Tue, 27 Jul 2010 16:09:47 +0200
Christoph Hellwig <hch@lst.de> wrote:
> On Tue, Jul 27, 2010 at 09:39:56AM -0400, Vivek Goyal wrote:
> > How can I do it better?
> >
> > I needed a unique identifier with which user can work in terms of
> > specifying weights to devices and in terms of understanding what stats
> > mean. Device major/minor number looked like a obivious choice.
> >
> > I was looking for how to determine what is the major/minor number of disk
> > request queue is associated with and I could use bdi to do that.
>
> The problem is that a queue can be shared between multiple gendisks,
Is anyone still doing this?
I thought that everyone agreed that this was wrong. Such users (like
MTD) were fixed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: struct backing_dev - purpose and life time rules
2010-07-27 23:55 ` FUJITA Tomonori
@ 2010-07-28 4:42 ` Kay Sievers
0 siblings, 0 replies; 6+ messages in thread
From: Kay Sievers @ 2010-07-28 4:42 UTC (permalink / raw)
To: FUJITA Tomonori
Cc: hch, vgoyal, jaxboe, peterz, akpm, viro, linux-mm, linux-fsdevel
On Wed, Jul 28, 2010 at 01:55, FUJITA Tomonori
<fujita.tomonori@lab.ntt.co.jp> wrote:
> Not a comment on the original topic,
>
> On Tue, 27 Jul 2010 16:09:47 +0200
> Christoph Hellwig <hch@lst.de> wrote:
>
>> On Tue, Jul 27, 2010 at 09:39:56AM -0400, Vivek Goyal wrote:
>> > How can I do it better?
>> >
>> > I needed a unique identifier with which user can work in terms of
>> > specifying weights to devices and in terms of understanding what stats
>> > mean. Device major/minor number looked like a obivious choice.
>> >
>> > I was looking for how to determine what is the major/minor number of disk
>> > request queue is associated with and I could use bdi to do that.
>>
>> The problem is that a queue can be shared between multiple gendisks,
>
> Is anyone still doing this?
>
> I thought that everyone agreed that this was wrong. Such users (like
> MTD) were fixed.
I think it was MTD, which is fixed, and floppy, which is still the way it was.
Kay
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-07-28 4:43 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-27 9:01 struct backing_dev - purpose and life time rules Christoph Hellwig
2010-07-27 9:14 ` Christoph Hellwig
2010-07-27 13:39 ` Vivek Goyal
2010-07-27 14:09 ` Christoph Hellwig
2010-07-27 23:55 ` FUJITA Tomonori
2010-07-28 4:42 ` Kay Sievers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).