From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755172Ab1JWIWM (ORCPT ); Sun, 23 Oct 2011 04:22:12 -0400 Received: from cantor2.suse.de ([195.135.220.15]:42404 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754978Ab1JWIWB (ORCPT ); Sun, 23 Oct 2011 04:22:01 -0400 Date: Sun, 23 Oct 2011 19:21:47 +1100 From: NeilBrown To: Alan Stern Cc: John Stultz , "Rafael J. Wysocki" , mark gross , Linux PM list , LKML Subject: Re: lsusd - The Linux SUSpend Daemon Message-ID: <20111023192147.49413485@notabene.brown> In-Reply-To: References: <20111022093436.0742176c@notabene.brown> X-Mailer: Claws Mail 3.7.10 (GTK+ 2.22.1; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/r+_.p0unpi80DPjoRQ_sOU_"; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --Sig_/r+_.p0unpi80DPjoRQ_sOU_ Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Fri, 21 Oct 2011 22:00:13 -0400 (EDT) Alan Stern wrote: > On Sat, 22 Oct 2011, NeilBrown wrote: >=20 > > > > It uses files in /var/run/suspend for all communication. > > >=20 > > > I'm not so keen on using files for communication. At best, they are > > > rather awkward for two-way messaging. If you really want to use them, > > > then at least put them on a non-backed filesystem, like something und= er > > > /dev. > >=20 > > Isn't /var/run a tmpfs filesystem? It should be. > > Surely /run is, so in the new world order the files should probably go > > there. But that is just a detail. >=20 > On my Fedora-14 systems there is no /run, and /var/run is a regular=20 > directory in a regular filesystem. >=20 > > I like files... I particularly like 'flock' to block suspend. The > > rest.... whatever.. > > With files, you only need a context switch when there is real communica= tion. > > With sockets, every message sent must be read so there will be a context > > switch. > >=20 > > Maybe we could do something with futexes... >=20 > Not easily -- as far as I can tell, futexes enjoy relatively little=20 > support. In any case, they provide the same service as a mutex, which=20 > means you'd have to build a shared lock on top of them. >=20 > > > > lsusd does not try to be event-loop based because: > > > > - /sys/power/wakeup_count is not pollable. This could probab= ly be > > > > 'fixed' but I want code to work with today's kernel. It wi= ll probably > > >=20 > > > Why does this matter? > >=20 > > In my mind an event based program should never block. Every action sho= uld be > > non-blocking and only taken when 'poll' says it can. > > Reading /sys/power/wakeup_count can be read non-blocking, but you canno= t find > > out when it is sensible to try to read it again. So it doesn't fit. >=20 > There shouldn't be any trouble about making wakeup_count pollable. It > also would need to respect nonblocking reads, which it currently does=20 > not do. Hmm.. you are correct. I wonder why I thought it did support non-blocking reads... I guess it was the code for handling an interrupted system call. I feel a bit uncomfortable with the idea of sysfs files that block but I don't think I can convincingly argue against it. A non-blocking flag could be passed in, but it would be a very messy change= - lots of function call signatures changing needlessly: we would need a flag to the 'show' method ... or add a 'show_nonblock' method which would also be ugly. But I think there is a need to block - if there is an in-progress event then it must be possible to wait for it to complete as it may not be visible to userspace until then. We could easily enable 'poll' for wakeup_count and then make it always non-blocking, but I'm not really sure I want to require programs to use pol= l, only to allow them. And without using poll there is no way to wait. As wakeup_count really has to be single-access we could possibly fudge something by remembering the last value read (like we remember the last val= ue written). - if the current count is different from the last value read, then return it even if there are in-progress events. - if the current count is the same as the last value read, then block until there are no in-progress events and return the new value. - enable sysfs_poll on wakeup_count by calling sysfs_notify_dirent at the end of wakeup_source_deactivated .... or calling something in kernel/power/main.c which calls that. However we would need to make sysfs_notify_dirent a lot lighter weight first. Maybe I should do that. Then a process that uses 'poll' could avoid reading wakeup_count except when it has changed, and then it won't block. And a process that doesn't use po= ll can block by simply reading twice - either explicitly or by going around a= =20 read then write and it fails loop a second time. I'm not sure I'm completely comfortable with that, but it is the best I cou= ld come up with. >=20 > At the worst, you could always have a separate thread to read=20 > wakeup_count. Maybe. I'm tempted simply not to worry about the short delay after all, but the moment I do that, someone will use a wakeup_source to disable suspend for a longer period of time (just like I suggested) and suddenly it won't be a short delay after all. >=20 > > > > - I cannot get an event notification when a lock is removed f= rom a > > > > file. :-( And I think locks are an excellent light-weight > > > > mechanism for blocking suspend. > > >=20 > > > Except for this one drawback. Socket connections are superior in tha= t=20 > > > regard. > >=20 > > I'm very happy for someone else write an all-socket based daemon. >=20 > Hmmm... Maybe I'll take you up on that. >=20 >=20 > > > > lsused will send a 'S' message to the client and await an 'R'= reply > > > > (S =3D=3D suspend, R =3D=3D ready). When all replies are in,= lsused will > > > > allow the suspend to complete. When it does (or aborts), it = will send > > > > 'A' (awake) to those clients to which it sent 'S'. > > >=20 > > > But not to the client which failed to send an 'R'? > >=20 > > Every client must send an R before suspend can continue. >=20 > I was referring to the case where you abort before receiving an 'R'. =20 > The current suspend attempt will fail, but then what happens during the > next attempt? There is no situation in which I abort before receiving an 'R'. I wait until all 'R's are in until I even check if an abort was requested. NeilBrown Sample untested patch to allow non-blocking reads and poll on wakeup_count. diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index 84f7c7d..54543ba 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -44,6 +44,8 @@ static void split_counters(unsigned int *cnt, unsigned in= t *inpr) =20 /* A preserved old value of the events counter. */ static unsigned int saved_count; +/* Record last value that was read from events counter. */ +static unsigned int last_read_count; =20 static DEFINE_SPINLOCK(events_lock); =20 @@ -410,6 +412,7 @@ static void wakeup_source_deactivate(struct wakeup_sour= ce *ws) { ktime_t duration; ktime_t now; + unsigned int comb; =20 ws->relax_count++; /* @@ -440,7 +443,10 @@ static void wakeup_source_deactivate(struct wakeup_sou= rce *ws) * Increment the counter of registered wakeup events and decrement the * couter of wakeup events in progress simultaneously. */ - atomic_add(MAX_IN_PROGRESS, &combined_event_count); + comb =3D atomic_add_return(MAX_IN_PROGRESS, &combined_event_count); + + if ((comb >> IN_PROGRESS_BITS) =3D=3D last_read_count + 1) + wakeup_count_changed(); } =20 /** @@ -624,15 +630,24 @@ bool pm_get_wakeup_count(unsigned int *count) =20 for (;;) { split_counters(&cnt, &inpr); - if (inpr =3D=3D 0 || signal_pending(current)) + if (inpr =3D=3D 0 || cnt !=3D last_read_count) break; + if (signal_pending(current)) + return false; pm_wakeup_update_hit_counts(); schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT)); } =20 + last_read_count =3D cnt; + /* If cnt has just changed, then last_read_count will be a bit + * old, so we won't block on the next read, only the one after. + * However this ensures wakeup_source_deactivate doesn't + * miss out on calling wakeup_count_changed() on a change. + */ + smp_wmb(); split_counters(&cnt, &inpr); *count =3D cnt; - return !inpr; + return true; } =20 /** diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c index 1ad8c93..41361ba 100644 --- a/fs/sysfs/file.c +++ b/fs/sysfs/file.c @@ -444,18 +444,19 @@ static unsigned int sysfs_poll(struct file *filp, pol= l_table *wait) =20 void sysfs_notify_dirent(struct sysfs_dirent *sd) { - struct sysfs_open_dirent *od; - unsigned long flags; + if (sd->s_attr.open) { + struct sysfs_open_dirent *od; + unsigned long flags; + spin_lock_irqsave(&sysfs_open_dirent_lock, flags); =20 - spin_lock_irqsave(&sysfs_open_dirent_lock, flags); + od =3D sd->s_attr.open; + if (od) { + atomic_inc(&od->event); + wake_up_interruptible(&od->poll); + } =20 - od =3D sd->s_attr.open; - if (od) { - atomic_inc(&od->event); - wake_up_interruptible(&od->poll); + spin_unlock_irqrestore(&sysfs_open_dirent_lock, flags); } - - spin_unlock_irqrestore(&sysfs_open_dirent_lock, flags); } EXPORT_SYMBOL_GPL(sysfs_notify_dirent); =20 diff --git a/include/linux/suspend.h b/include/linux/suspend.h index 6bbcef2..e42b7f9 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -296,6 +296,7 @@ extern bool events_check_enabled; extern bool pm_wakeup_pending(void); extern bool pm_get_wakeup_count(unsigned int *count); extern bool pm_save_wakeup_count(unsigned int count); +extern void wakeup_count_changed(void); #else /* !CONFIG_PM_SLEEP */ =20 static inline int register_pm_notifier(struct notifier_block *nb) diff --git a/kernel/power/main.c b/kernel/power/main.c index 6c601f8..6b3cd80 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -231,6 +231,8 @@ power_attr(state); * are any wakeup events detected after 'wakeup_count' was written to. */ =20 +struct sysfs_dirent *wakeup_count_dirent; + static ssize_t wakeup_count_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -253,6 +255,12 @@ static ssize_t wakeup_count_store(struct kobject *kobj, return -EINVAL; } =20 +void wakeup_count_change(void) +{ + if (wakeup_count_dirent) + sysfs_notify_dirent(wakeup_count_dirent); +} + power_attr(wakeup_count); #endif /* CONFIG_PM_SLEEP */ =20 @@ -342,7 +350,11 @@ static int __init pm_init(void) power_kobj =3D kobject_create_and_add("power", NULL); if (!power_kobj) return -ENOMEM; - return sysfs_create_group(power_kobj, &attr_group); + error =3D sysfs_create_group(power_kobj, &attr_group); +#ifdef CONFIG_PM_SLEEP + wakeup_count_dirent =3D sysfs_get_dirent(power_kobj->sd, NULL, "wakeup_co= unt"); +#endif + return error; } =20 core_initcall(pm_init); --Sig_/r+_.p0unpi80DPjoRQ_sOU_ Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQIUAwUBTqPOpTnsnt1WYoG5AQL4lg/4i2+ezogzDgzvKmEfynmN6htHx3CH418h RtLgOj/INAefldEqsqX0R59zPe/X3tlErF7oCPjvbA4aerocQxUMzZwnt+NWbn5l ps1aSRdoXKUfcn42ZC/7G9aTcnkcmyAoqLPR+80kKd2hVX41VSLvJ8M8L+Mto6Lb 9mL3uk5WVf1FKlo0Og7Ep58dDFXcbA1SEttOIpxTkf/l1DW6vcPzBm0OGW0Bne9d rSBLKXIHZ6PLZmydVO8YtSnAS4y7PvjA/7wz7ly4/YqrBU8n7Zx5Kzv8UdBqRd+y XQ4teS9Ny5kW+GQt+U8TubQIMgjqRUxRaG1CCCs+K4pUOcfCPtN20IsOnxreVkg1 MY4pRT+LSg+T2ELv5h0FwFMScXmTfo90E4NPrwZ0NFGtc6btZz0onf3r8Q9Ru1Mj hux9bfjHyjCLxTL++kZLlKg3FJQPUCQBHfJ9kar3LHrNWwIAdt4zm6e3e81dS5+O mkJgzUBfIhcCW4mOga20uEdMer1+wzFF0N9M01XRDGR8Y26ApP6Nt4VS2bTVe3rF EbdeorG9A/nmDPtrjAKY7UoU3BeZxhxIlM6JGapCNLbFAQdeTYVqId60ABJEriM4 UM/T17S6KfikaavErJPug8vxAbAr5GIlTnjeIUO7DQb4eQJryz8o8ODcYss2p65A /JPPOK5/Xg== =JEtr -----END PGP SIGNATURE----- --Sig_/r+_.p0unpi80DPjoRQ_sOU_--