* Some questions about memory allocation and BKL
@ 2003-10-26 4:42 Ian Kent
2003-10-26 9:44 ` Jan Hudec
2003-10-26 18:37 ` dirty buffer_head, but not marked dirty Mark B
0 siblings, 2 replies; 9+ messages in thread
From: Ian Kent @ 2003-10-26 4:42 UTC (permalink / raw)
To: linux-fsdevel
Hi,
Not sure this is the right place to discuss this but ...
I have been working on changes to the autofs4 kernel module for some time
now, in order to support enhancements to the user space daemon.
The BKL is widely used in both the autofs and autofs4 modules. Both of
these modules call kmalloc under the BKL. Since the BKL is a spinlock and
kmalloc can sleep this is bad. Worse, I have used kmalloc with GFP_ATOMIC
inside the dcache_lock, which I suspect may be failing on after about a
week of heavy activity. This has lead me to the following questions. The
situation here is that the memory is used to communicate requests to the
userspace daemon on behalf of processes.
1) Since the BKL is so widely used is it really worth eliminating it from
code that you work on?
2) How should smallish memory chunks be allocated if under the BKL if at
all? How else should it be done? What about other spinlocks like
dcache_lock?
3) How do I get hold of the vfsmount struct that corresponds to a dentry
from within revalidate and lookup calls?
4) How does one decide when locking is actually needed? For example, what
is the 'usual' locking need for the dentry release operation?
I realize that these are, in many ways, simple minded questions but I have
considered them for some time and every time I work on the autofs4 module I
always find myself saying "does this really need protection". Looking
elsewhere, such as the VFS, autofs and nfs leaves me even more unsure.
Hope that there is some interest in a small discussion on this.
Regards
Ian
--
,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: raven@themaw.net
v Web: http://themaw.net/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Some questions about memory allocation and BKL
2003-10-26 4:42 Some questions about memory allocation and BKL Ian Kent
@ 2003-10-26 9:44 ` Jan Hudec
2003-10-26 11:27 ` Ian Kent
2003-10-26 18:37 ` dirty buffer_head, but not marked dirty Mark B
1 sibling, 1 reply; 9+ messages in thread
From: Jan Hudec @ 2003-10-26 9:44 UTC (permalink / raw)
To: Ian Kent; +Cc: linux-fsdevel
On Sun, Oct 26, 2003 at 12:42:01 +0800, Ian Kent wrote:
> The BKL is widely used in both the autofs and autofs4 modules. Both of
> these modules call kmalloc under the BKL. Since the BKL is a spinlock and
> kmalloc can sleep this is bad.
BKL is not a (normal) spinlock. It is a magic lock that makes UP from
SMP. You can sleep under BKL -- it is magicaly unlocked when you
schedule() and is locked again before schedule() returns.
However, use of BKL should be avoided.
> Worse, I have used kmalloc with GFP_ATOMIC
> inside the dcache_lock, which I suspect may be failing on after about a
> week of heavy activity.
That's a different story -- atomic kmalloc might fail under heavy
activity. You can allocate the buffer before you take the spinlock and
release it if something fails during work under the lock.
> This has lead me to the following questions. The
> situation here is that the memory is used to communicate requests to the
> userspace daemon on behalf of processes.
If the allocation unit is at least a page, it might be better to use
vmalloc.
> 1) Since the BKL is so widely used is it really worth eliminating it from
> code that you work on?
Eliminating BKL is always good. BKL is a relict from times when kernel
was not SMP.
> 2) How should smallish memory chunks be allocated if under the BKL if at
> all? How else should it be done? What about other spinlocks like
> dcache_lock?
BKL does not matter. It's only efect is that it kills performance on
SMP. You CAN sleep under BKL.
Under spinlock, you need allocate with GFP_ATOMIC flag. That can
unfortunately fail too often, so you need to handle that case.
So you either allocate before you take the spinlock, or take the
spinlock, try to allocate and if it fails, drop it, allocate and restart
the operation.
> 3) How do I get hold of the vfsmount struct that corresponds to a dentry
> from within revalidate and lookup calls?
Perhaps try looking at how .. is followed in lookup (follow_dotdot or
some such).
> 4) How does one decide when locking is actually needed? For example, what
> is the 'usual' locking need for the dentry release operation?
I wonder if the Documentation/filesystems/Locking file is up to date...
Generaly speaking, operations on lists (dentry hash, dentry tree, inode
hash) are protected with respective spinlocks (dcache_lock, inode_lock).
Dentries shouldn't be modified once constructed (at least not directly
-- helper functions take enough care). Reading them is only protected by
properly holding a reference.
When modifying an inode, you need to hold it's semaphore. Renames and
removes have special additional locking (not of interest outside the
two).
To your example, if you mean d_release dentry operation, that one needs
no locking at all. It gets a last reference and the dentry is already
unhashed, so noone else can get it.
Generaly in methods (functions called via *_operations structs), you
already have some locks and usualy they are sufficient.
It is described in Documentation/filesystems/Locking and in .../vfs.txt
-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <bulb@ucw.cz>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Some questions about memory allocation and BKL
2003-10-26 9:44 ` Jan Hudec
@ 2003-10-26 11:27 ` Ian Kent
2003-10-26 12:43 ` Jan Hudec
0 siblings, 1 reply; 9+ messages in thread
From: Ian Kent @ 2003-10-26 11:27 UTC (permalink / raw)
To: Jan Hudec; +Cc: linux-fsdevel
Thanks for you reply.
On Sun, 26 Oct 2003, Jan Hudec wrote:
> On Sun, Oct 26, 2003 at 12:42:01 +0800, Ian Kent wrote:
> > The BKL is widely used in both the autofs and autofs4 modules. Both of
> > these modules call kmalloc under the BKL. Since the BKL is a spinlock and
> > kmalloc can sleep this is bad.
>
> BKL is not a (normal) spinlock. It is a magic lock that makes UP from
> SMP. You can sleep under BKL -- it is magicaly unlocked when you
> schedule() and is locked again before schedule() returns.
That explains why I can find so many examples of it used in this way.
Excelent. I can kmalloc without concern that the BKL was taken by the
kernel before calling me.
>
> However, use of BKL should be avoided.
Will do. I see that in 2.6 the lock_kernel has mostly been pushed into the
autofs4 module, probably to maintain the status quo, rather than cause
it's needed.
>
> > Worse, I have used kmalloc with GFP_ATOMIC
> > inside the dcache_lock, which I suspect may be failing on after about a
> > week of heavy activity.
>
> That's a different story -- atomic kmalloc might fail under heavy
> activity. You can allocate the buffer before you take the spinlock and
> release it if something fails during work under the lock.
Did that after sending the questions.
>
> > This has lead me to the following questions. The
> > situation here is that the memory is used to communicate requests to the
> > userspace daemon on behalf of processes.
>
> If the allocation unit is at least a page, it might be better to use
> vmalloc.
They are typically <1k.
>
> > 1) Since the BKL is so widely used is it really worth eliminating it from
> > code that you work on?
>
> Eliminating BKL is always good. BKL is a relict from times when kernel
> was not SMP.
>
snip ...
>
> > 3) How do I get hold of the vfsmount struct that corresponds to a dentry
> > from within revalidate and lookup calls?
>
> Perhaps try looking at how .. is followed in lookup (follow_dotdot or
> some such).
Been there, will look again.
I seem to remember that the path lookup has a nameidata struct with this
information in it. That is not available when the filesystem inode lookup
and the dentry revalidate operations are called. I can't think of a way to
get hold of it. Basically, I need to calculate the path from a dentry to
the root of it's mounted filesystem. I currently use for loops and walk up
the d_parent links, but think it would be better to use d_path. Maybe the
existing way is the way to continue to to do it.
>
> > 4) How does one decide when locking is actually needed? For example, what
> > is the 'usual' locking need for the dentry release operation?
>
> I wonder if the Documentation/filesystems/Locking file is up to date...
I've checked the kernel docs but I think I missed that. I'll have a look.
>
> Generaly speaking, operations on lists (dentry hash, dentry tree, inode
> hash) are protected with respective spinlocks (dcache_lock, inode_lock).
I see that. I should know that and I guess I do but the obvious is never
clear to me. I'll just have to keep asking and being told and reading. One
day, soon I hope it will sink in.
>
> Dentries shouldn't be modified once constructed (at least not directly
> -- helper functions take enough care). Reading them is only protected by
> properly holding a reference.
This is a source of confusion for me.
So should I always hold the dcache_lock when testing a dentry, even for
such simple things as d_mountpoint or dentry->d_inode == NULL?
I expect that I need to hold the lock for compound tests to be valid.
>
> When modifying an inode, you need to hold it's semaphore. Renames and
> removes have special additional locking (not of interest outside the
> two).
I see that the inode semaphore is taken by the readdir call.
I have used it to serialise access to an info struct attached to
the corresponding dentry in the supporting open and close
directory operationss. Is this a semsible approach. I realise that
I need to be carefull that I don't trigger a call that also needs it. Or
do you think I should declare my own semaphore within the info struct.
>
> To your example, if you mean d_release dentry operation, that one needs
> no locking at all. It gets a last reference and the dentry is already
> unhashed, so noone else can get it.
Ahh yes, I need to look at the source more closely. Mind, your having said
it will enable me to see it. There is quite a bit of it.
>
> Generaly in methods (functions called via *_operations structs), you
> already have some locks and usualy they are sufficient.
> It is described in Documentation/filesystems/Locking and in .../vfs.txt
Will look for that.
Thanks again for your help.
--
,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: raven@themaw.net
v Web: http://themaw.net/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Some questions about memory allocation and BKL
2003-10-26 11:27 ` Ian Kent
@ 2003-10-26 12:43 ` Jan Hudec
2003-10-26 13:21 ` Ian Kent
0 siblings, 1 reply; 9+ messages in thread
From: Jan Hudec @ 2003-10-26 12:43 UTC (permalink / raw)
To: Ian Kent; +Cc: linux-fsdevel
On Sun, Oct 26, 2003 at 19:27:12 +0800, Ian Kent wrote:
> On Sun, 26 Oct 2003, Jan Hudec wrote:
> > However, use of BKL should be avoided.
> Will do. I see that in 2.6 the lock_kernel has mostly been pushed into the
> autofs4 module, probably to maintain the status quo, rather than cause
> it's needed.
Yes. It's just because noone has checked that locking inside autofs4 is
correct.
> > > 3) How do I get hold of the vfsmount struct that corresponds to a dentry
> > > from within revalidate and lookup calls?
> >
> > Perhaps try looking at how .. is followed in lookup (follow_dotdot or
> > some such).
>
> Been there, will look again.
>
> I seem to remember that the path lookup has a nameidata struct with this
> information in it. That is not available when the filesystem inode lookup
> and the dentry revalidate operations are called. I can't think of a way to
> get hold of it. Basically, I need to calculate the path from a dentry to
> the root of it's mounted filesystem. I currently use for loops and walk up
> the d_parent links, but think it would be better to use d_path. Maybe the
> existing way is the way to continue to to do it.
In 2.6, nameidata should be available to ->lookup method. Just looked up
on lxr.linux.no -- ->d_revalidate has them available too. So for 2.6 you
can look there; for older kernels a loop over ->d_parent probably has to
suffice.
> >
> > > 4) How does one decide when locking is actually needed? For example, what
> > > is the 'usual' locking need for the dentry release operation?
> >
> > I wonder if the Documentation/filesystems/Locking file is up to date...
>
> I've checked the kernel docs but I think I missed that. I'll have a look.
Documentation/filesystems/Locking and Documentation/filesystems/vfs.txt
are exact paths in kernel tree in 2.4.x. lxr.linux.no says, that they
exist also in 2.6.0-test* with the same name. There is also
Documentation/filesystems/porting in 2.6.0-test*, which sumarises what
changed since 2.4.
> > Generaly speaking, operations on lists (dentry hash, dentry tree, inode
> > hash) are protected with respective spinlocks (dcache_lock, inode_lock).
>
> I see that. I should know that and I guess I do but the obvious is never
> clear to me. I'll just have to keep asking and being told and reading. One
> day, soon I hope it will sink in.
>
> > Dentries shouldn't be modified once constructed (at least not directly
> > -- helper functions take enough care). Reading them is only protected by
> > properly holding a reference.
>
> This is a source of confusion for me.
>
> So should I always hold the dcache_lock when testing a dentry, even for
> such simple things as d_mountpoint or dentry->d_inode == NULL?
No. It is forbiden to turn dentry to negative one unless you are the
only holder (it has refcount 1). See d_delete().
Semanticaly, dentries never change. So once they are filled and
published, you can access them without any locks. However, you must have
a counted reference.
Technicaly, if the dentry has refcount 1, it is reused. But that is just
a shortcut for droping it (it would be immediately freed) and
reallocating it. It can never cross your path as long as you refcount
correctly.
> I expect that I need to hold the lock for compound tests to be valid.
No. You only need to hold a reference.
You only need to hold the lock when you check whether the dentry is
hashed. And you need to use the atomic_* functions to access the
refcount.
> > When modifying an inode, you need to hold it's semaphore. Renames and
> > removes have special additional locking (not of interest outside the
> > two).
>
> I see that the inode semaphore is taken by the readdir call.
Yes. It serializes against creating new entries.
> I have used it to serialise access to an info struct attached to
> the corresponding dentry in the supporting open and close
> directory operationss. Is this a semsible approach. I realise that
> I need to be carefull that I don't trigger a call that also needs it. Or
> do you think I should declare my own semaphore within the info struct.
There is the "Locking" file, that tells you what state do various
methods expect the locks to be. As long as you can satisfy that
expectations, it's a sensible approach.
Note: Make sure you understand locking scheme of d_release and d_iput.
It can get nasty if your info is accessible by other means than from the
dentry.
-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <bulb@ucw.cz>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Some questions about memory allocation and BKL
2003-10-26 12:43 ` Jan Hudec
@ 2003-10-26 13:21 ` Ian Kent
2003-10-26 14:01 ` Jan Hudec
0 siblings, 1 reply; 9+ messages in thread
From: Ian Kent @ 2003-10-26 13:21 UTC (permalink / raw)
To: Jan Hudec; +Cc: linux-fsdevel
On Sun, 26 Oct 2003, Jan Hudec wrote:
> In 2.6, nameidata should be available to ->lookup method. Just looked up
> on lxr.linux.no -- ->d_revalidate has them available too. So for 2.6 you
> can look there; for older kernels a loop over ->d_parent probably has to
> suffice.
I'll have to check what I get in it. I saw that. There doesn't seem to be
many changes from the point of view of autofs4. I haven't actually ported my
changes yet.
>
> Documentation/filesystems/Locking and Documentation/filesystems/vfs.txt
> are exact paths in kernel tree in 2.4.x. lxr.linux.no says, that they
> exist also in 2.6.0-test* with the same name. There is also
> Documentation/filesystems/porting in 2.6.0-test*, which sumarises what
> changed since 2.4.
Been there now. Had been there long ago but forgot about it. Great.
>
> > > Generaly speaking, operations on lists (dentry hash, dentry tree, inode
> > > hash) are protected with respective spinlocks (dcache_lock, inode_lock).
> >
> > I see that. I should know that and I guess I do but the obvious is never
> > clear to me. I'll just have to keep asking and being told and reading. One
> > day, soon I hope it will sink in.
> >
> > > Dentries shouldn't be modified once constructed (at least not directly
> > > -- helper functions take enough care). Reading them is only protected by
> > > properly holding a reference.
> >
> > This is a source of confusion for me.
> >
> > So should I always hold the dcache_lock when testing a dentry, even for
> > such simple things as d_mountpoint or dentry->d_inode == NULL?
>
> No. It is forbiden to turn dentry to negative one unless you are the
> only holder (it has refcount 1). See d_delete().
Sorry, my bad description. I was thinking about if (...) to check
fields which I think, acording to the discussion, is OK but ...
>
> Semanticaly, dentries never change. So once they are filled and
> published, you can access them without any locks. However, you must have
> a counted reference.
>
> Technicaly, if the dentry has refcount 1, it is reused. But that is just
> a shortcut for droping it (it would be immediately freed) and
> reallocating it. It can never cross your path as long as you refcount
> correctly.
>
> > I expect that I need to hold the lock for compound tests to be valid.
>
> No. You only need to hold a reference.
>
> You only need to hold the lock when you check whether the dentry is
> hashed. And you need to use the atomic_* functions to access the
> refcount.
Does that also include tests on a list in the dentry.
I think that you said that this is what needs the lock, correct?
For example:
if (!d_mountpoint(dentry) && list_empty(&dentry->d_subdirs))
...
needs the dcache_lock?
Small detail. Seems so obvious now.
>
> > > When modifying an inode, you need to hold it's semaphore. Renames and
> > > removes have special additional locking (not of interest outside the
> > > two).
> >
> > I see that the inode semaphore is taken by the readdir call.
>
> Yes. It serializes against creating new entries.
>
> > I have used it to serialise access to an info struct attached to
> > the corresponding dentry in the supporting open and close
> > directory operationss. Is this a semsible approach. I realise that
> > I need to be carefull that I don't trigger a call that also needs it. Or
> > do you think I should declare my own semaphore within the info struct.
>
> There is the "Locking" file, that tells you what state do various
> methods expect the locks to be. As long as you can satisfy that
> expectations, it's a sensible approach.
Great. I'll see how it goes.
>
> Note: Make sure you understand locking scheme of d_release and d_iput.
> It can get nasty if your info is accessible by other means than from the
> dentry.
Thanks. I don't think so. I'll think about that and check it further.
--
,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: raven@themaw.net
v Web: http://themaw.net/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Some questions about memory allocation and BKL
2003-10-26 13:21 ` Ian Kent
@ 2003-10-26 14:01 ` Jan Hudec
2003-10-26 15:59 ` Ian Kent
0 siblings, 1 reply; 9+ messages in thread
From: Jan Hudec @ 2003-10-26 14:01 UTC (permalink / raw)
To: Ian Kent; +Cc: linux-fsdevel
On Sun, Oct 26, 2003 at 21:21:31 +0800, Ian Kent wrote:
> On Sun, 26 Oct 2003, Jan Hudec wrote:
> > No. It is forbiden to turn dentry to negative one unless you are the
> > only holder (it has refcount 1). See d_delete().
>
> Sorry, my bad description. I was thinking about if (...) to check
> fields which I think, acording to the discussion, is OK but ...
inode and name are constant, so don't need lock. all the lists need
dcache_lock (subdirs, hash etc...). Parent should need dcache_lock too
(try to look it up in rename code). I don't know about mountpoint, but
obviously requires some locking.
> > You only need to hold the lock when you check whether the dentry is
> > hashed. And you need to use the atomic_* functions to access the
> > refcount.
>
> Does that also include tests on a list in the dentry.
> I think that you said that this is what needs the lock, correct?
>
> For example:
>
> if (!d_mountpoint(dentry) && list_empty(&dentry->d_subdirs))
> ...
>
> needs the dcache_lock?
I assume yes. The tree lists should need dcache_lock.
-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <bulb@ucw.cz>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Some questions about memory allocation and BKL
2003-10-26 15:59 ` Ian Kent
@ 2003-10-26 15:54 ` Jan Hudec
0 siblings, 0 replies; 9+ messages in thread
From: Jan Hudec @ 2003-10-26 15:54 UTC (permalink / raw)
To: Ian Kent; +Cc: linux-fsdevel
On Sun, Oct 26, 2003 at 23:59:09 +0800, Ian Kent wrote:
>
> Thanks. This is exaclty what I needed to clear things up.
>
> I don't mean to be a pest but I have just one more question.
>
> In 2.6 there is a lock for the vfsmount mount list. I can't see anything
> in 2.4 or the doco. How should this structure be protected during a
> traversal of the vfsmount tree?
>
> On Sun, 26 Oct 2003, Jan Hudec wrote:
>
> > (try to look it up in rename code). I don't know about mountpoint, but
> > obviously requires some locking.
>
> d_mountpoint just returns the value of dentry->d_mounted, an int.
Yes. But to know what locks you need when you read it, you must look
what locks are held when it's changed.
-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <bulb@ucw.cz>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Some questions about memory allocation and BKL
2003-10-26 14:01 ` Jan Hudec
@ 2003-10-26 15:59 ` Ian Kent
2003-10-26 15:54 ` Jan Hudec
0 siblings, 1 reply; 9+ messages in thread
From: Ian Kent @ 2003-10-26 15:59 UTC (permalink / raw)
To: Jan Hudec; +Cc: linux-fsdevel
Thanks. This is exaclty what I needed to clear things up.
I don't mean to be a pest but I have just one more question.
In 2.6 there is a lock for the vfsmount mount list. I can't see anything
in 2.4 or the doco. How should this structure be protected during a
traversal of the vfsmount tree?
On Sun, 26 Oct 2003, Jan Hudec wrote:
> (try to look it up in rename code). I don't know about mountpoint, but
> obviously requires some locking.
d_mountpoint just returns the value of dentry->d_mounted, an int.
--
,-._|\ Ian Kent
/ \ Perth, Western Australia
*_.--._/ E-mail: raven@themaw.net
v Web: http://themaw.net/
^ permalink raw reply [flat|nested] 9+ messages in thread
* dirty buffer_head, but not marked dirty
2003-10-26 4:42 Some questions about memory allocation and BKL Ian Kent
2003-10-26 9:44 ` Jan Hudec
@ 2003-10-26 18:37 ` Mark B
1 sibling, 0 replies; 9+ messages in thread
From: Mark B @ 2003-10-26 18:37 UTC (permalink / raw)
To: linux-fsdevel
Hi!
One quick question.
Someone takes a buffer_head via sb_bread(),
modifies the data in it (b_data), and then decides that the modifications are
not ok, so we want the old one back...
Is there a function to mark the data trashed or should we simply release it
with brelse()?
I think brelse() is not good to do (someone else (or we) may get the trashed
data with the next call from sb_bread() )
Would something like reiserfs_unmap_buffer() do the job correctly(would it
force to reread the buffer)?
(I know I could easily keep a copy of it or reverse the changes, but the
changes are through more blocks, so it would be much harder to restore the
situation by calling some trash() function on them)
--
Mark Burazin
mark@lemna.hr
---<>---<>---<>---<>---<>---<>---<>---<>---<>
Lemna d.o.o.
http://www.lemna.biz - info@lemna.hr
<>---<>---<>---<>---<>---<>---<>---<>---<>---
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-10-26 18:31 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-26 4:42 Some questions about memory allocation and BKL Ian Kent
2003-10-26 9:44 ` Jan Hudec
2003-10-26 11:27 ` Ian Kent
2003-10-26 12:43 ` Jan Hudec
2003-10-26 13:21 ` Ian Kent
2003-10-26 14:01 ` Jan Hudec
2003-10-26 15:59 ` Ian Kent
2003-10-26 15:54 ` Jan Hudec
2003-10-26 18:37 ` dirty buffer_head, but not marked dirty Mark B
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox