Re: [2.6.18 PATCH]: Filesystem Event Reporter V4

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yi Yang <yang.y.yi@gmail.com>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: linux-kernel@vger.kernel.org, Andrew Morton <akpm@osdl.org>,
	Matt Helsley <matthltc@us.ibm.com>
Subject: Re: [2.6.18 PATCH]: Filesystem Event Reporter V4
Date: Sun, 08 Oct 2006 22:34:02 +0800	[thread overview]
Message-ID: <45290C5A.1020708@gmail.com> (raw)
In-Reply-To: <4c4443230610080629j33bc3685g8bb22029c390727d@mail.gmail.com>


>     I do not say that they are broken, but you in some places you
>     access per-cpu
>     variuables without turning preemption off. I think some locking or
>     preemption tweaks should be done there to explicitly mark critical
>     regions.
>
You're right, some places have such issues, I just considered how to 
avoid the lock
or atomic operation, Andrew ever mentioned the lock is unacceptible in 
file system
 code path, so I always avoid the lock or atomic operation.

>     > >What prevents from adding another skb into the queue between
>     above loop
>     > >and check for flag?
>     > >
>     > before adding a fsevent to the queue, a process will check
>     exit_flag, if
>     > it is set to 1, that
>     > process won't queue the fsevent and return immediately.
>
>     But you check for exit_flag in fsevent_commit() without any locks.
>
Only rmmod will set exit_flag, other users are readers, so I think the 
lock is unnecessary,
 only one issue is that I  should clear fsevent_queue in the last 
section of fsevent_exit.
>
>     > >
>     > >Above operation seems racy, what prevents from changing
>     missed_refcnt
>     > >after it was read?
>     > >
>     > if the case you said is hit, missed_refcnt must be not equal to
>     > missed_refcnt, because they are for the same cpu, so no problem,
>     it will
>     > be checked
>     > in the next work schedule.
>
>     Since it is called with disabled preemption it is ok, but in that
>     case
>     you do not need missed_refcnt to be atomic.
>
in include/linux/fsevent.h, it is possibly accessed from diffrent cpus, 
so atomic is necessary.
>
>     > >Why are you doing this? It looks wrong, since socket's queue is
>     cleaned
>     > >automatically.
>     > >
>     > When I release fsevent_sock, the kernel always printk a message
>     which
>     > says "sk_rmem_alloc isn't zero",
>     > I don't know why, I doubt there are some packets in recieve and
>     write
>     > queue, so try to free them.
>     > but sk_rmem_alloc is always non-zero, so I must set it to 0, the
>     kernel
>     > doesn't printk.
>
>     That means that you broke socket accounting in some way.
>     sock_release() should do all cleanup for you.
>
>     Each time you add skb into socket queue appropriate socket is
>     charged for
>     value equal to sizeof(skb)+sizeof(skb_shared_info)+aligned size of
>     the data.
>     That number is added to the one of the sk_r/wmem_alloc, depending
>     on the
>     direction of the skb way, skb's destructor is set to the function
>     which
>     will remove appropiate amount of from above variables.
>     When you call sock_release() all skbs are removed and freed, so socket
>     accounting is corrected in kfree_skb(), which (if there are no users)
>     calls destructor and frees skb and data.
>     If you see asserions that above variables are not zero, that means
>     that
>     you either removed skb from the queue and forgot to free it, or
>     freed it
>     several times (although it will be likely a crash in this case),
>     or you
>     overwrote that variables after some memory corruption.
>
maybe that surplus skb_get is the root cause.
>
>     > >This is racy.
>     > >
>     > This doesn't take effect in the normal processing, the work
>     kthread will
>     > do the real
>     > work which will ensure no racy.
>
>     Then just remove it, and actually the whole modularity does not
>     seems a
>     good idea, although it is of course your decision to make design
>     static
>     or not. I would implement such things with dynamic registration of
>     the
>     clients and just make fsevent statically built into the kernel.
>
It is hard a bit for the subsystem using the hook mechanism to be 
implemented as
 a module. In fact, all the newly-added code in this patch is for 
modularity. :)

Really that is a way to build as a static infrastructure, the filesystem 
init code calls
 a fsevent register API to enable it, but unregister is not a trivial, 
the syncronization
issue still exists.  Nevertheless, this is really is a way I can try.
>
>     > >This looks really racy.
>     > >What prevents from rescheduling here?
>     > >
>     > This has disabled the preemption, so it is impossible to reshcedule.
>
>     No, put_fsevent_refcnt() andbles it again.
>     Or is it disabled on higher layer?
>
I think your "reschedule" means process migration, those code just considers
 this issue, missed_refcnt is just for this, start_cpuid is used to 
identify the cpu
before migration, end_cpuid is used to identify the cpu after migration, if
start_cpuid is equal to end_cpuid, we can think there is no migration 
happened,
 otherwise, missed_refcnt[start_cpuid] will increase, because there are 
possibly
 several prcoesses on different cpus to modify this value, so it is 
defined as
atomic.
>
>     > >
>     > >What prevents change for __raise_fsevent in that function?
>     > >
>     > If reference count is not -1, rmmod won't change
>     __raise_fsevent. the
>     > key is two new-added
>     > refrence counters.
>
>     You do it without preemption disabled and any other locks...
>
Only rmmod will change __raise_fsevent and it will set it to 0 just after
all the filesystem code paths nerver call it, if reference count on anuy cpu
 is not -1, rmmod will wait for it until this cpu doesn't call raise_fsevent
 any more, rmmod will set it to 0 just after all the reference count on 
all the
cpu are -1, so only one user -- rmmod -- is accessing it in that time, 
this is
safe.

     prev parent reply	other threads:[~2006-10-08 14:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-30 15:24 [2.6.18 PATCH]: Filesystem Event Reporter V4 Yi Yang
2006-10-03 16:47 ` Evgeniy Polyakov
2006-10-04 15:15   ` Yi Yang
2006-10-05  9:41     ` Evgeniy Polyakov
     [not found]       ` <4c4443230610080629j33bc3685g8bb22029c390727d@mail.gmail.com>
2006-10-08 14:34         ` Yi Yang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45290C5A.1020708@gmail.com \
    --to=yang.y.yi@gmail.com \
    --cc=akpm@osdl.org \
    --cc=johnpol@2ka.mipt.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthltc@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.