A Home Automation System for Linux in Ruby Conceived by: Hal Fulton |
Domo is a piece of pure vaporware at the moment. It is intended to be a distributed, full-featured home automation software system, written in Ruby and running on Linux. Since we don't all agree on what constitutes "home automation," I'll go into detail on this later. See below.
Some questions naturally arise. (This is not exactly a FAQ, since these haven't been asked that much yet.)
Why the name "Domo"? Well, it reminds me of the Japanese phrase domo arigato, meaning "thank you very much." And by association, it even reminds me of the old Styx song that says, "Domo arigato, Mr. Roboto!" And yes, I did think of calling it "Mr. Roboto" or even "Mr. Ruboto"; but in the end, I chose Domo for a totally unrelated reason: It's the Esperanto word for house, chosen for its recognizable kinship with words like domicile and domestic. So now you know.
Why Ruby? The answer is that Ruby is simply the most beautiful, most maintainable language I know of. No flames, please; I don't know LISP or SmallTalk or your favorite language. If you don't know about Ruby (how did you get here?) you can go to the main Ruby site and read more. At any rate, I'm adamant that Domo will be scriptable in Ruby, so writing the bulk of the code in Ruby seems natural to me. (I'm certainly not opposed to exposing an API that can be called from Perl, Python, Java, whatever. A well-documented socket-based API might be good for that.)
Why Linux? I'm tired of Windows and I'm switching to a Linux-only home within the year. Of course, I can think of no reason that it might not run on FreeBSD or various UNIX variants; but I can't target everything, and I'm specifically not targetting Windows.
Why not cross-platform? For one thing, it increases the development effort. For another thing, not all the "pieces" I am thinking of interfacing with are themselves cross-platform. And thirdly, there are so many pieces of "neat" software that are only available on Windows. I'm in favor of tipping that balance a little. Let's create some neat software that just won't run on Windows.
Why not use MisterHouse? Well, MH is fairly mature and has some neat features. But it's Perl-based. I'm not opposed to Perl, and I even thought about a Ruby interface for MH that would at least allow scripting in Ruby. But I feel that sometimes it's better just to start over.
So are you completely reinventing the wheel? Not at all. Some components of the system will be more or less "black boxes"; they'll be given a Rubyesque API and left alone. Many of these need to be written in C for speed, or they are too complex to develop from scratch. Examples are voice recognition and speech synthesis. (Many don't see these as fitting into home automation at all. I think they're important. See below.)
What do you use now? On Windows, I use HomeSeer (www.homeseer.com). Out of the four or five packages I've looked at, it's by far the best. The interface is powerful and flexible, the hardware support is great, the software is very stable, and the online support is excellent (both from the developers and the user community). The library of existing scripts is wonderful, and the API is rich and flexible. The only three negatives: 1) It's Microsoft only. 2) It's not open-source. 3) I can't get it to cooperate with ActiveScriptRuby (though people are successfully scripting it in Perl and Python).
So do you want to "clone" this other piece of software? No, definitely not. For all its good qualities, I think we can do better. For one thing, it is not distributed. You can't, for example, have a client on each computer in your house.
Will this be open source? Yes, definitely. I lean toward the "Ruby license"; but this may be problematic as I may interface with packages licensed differently.
Are you writing this all yourself? Absolutely not. But I have had trouble generating interest in it. If I can in fact generate interest, I'll start a project on Sourceforge or the equivalent.
Some will think this definition is too broad. And I will narrow it later.
However, it is narrow enough already that it excludes the household that has
some X10 hardware and a few remote controls, but no computer controlling it
all. To me as a hacker, the computer is essential. I don't want just to control
my house; I want to program my house.
When I discuss some of the features I like in an HA system, many people say
they don't really consider those features to be home automation. The prime
examples are voice recognition and speech synthesis.
But many people consider these to be very run-of-the-mill features. Go read
the comp.home.automation newsgroup, or go there and ask how many of
them use voice recognition and text-to-speech features. (Read this group
anyway. It's great.)
Maybe you don't think these are part of HA. But I (and many others) do. I like
being able to sit on my couch and control things just by talking to the mike.
I don't even have to grab the wireless keyboard that goes with the downstairs
box. I can say, "too cold in here"; and my computer will respond by bumping
up the thermostat and acknowledging it by saying "temperature up."
I distribute the audio from the computer through my whole house with wireless
speakers. The computer wakes me up and tells me the time, day of the week,
and date. Then it tells me the weather forecast (which it retrieves from the
web). Then it reminds me of the things I have on my to-do list that day. That's
all before I even get up. I also have the "CNN breaking news" script; it checks
the website every 3 minutes and when there's a breaking news item, it plays a
WAV file to alert me, and then reads the item.
You're free not to like voice and speech features. But don't tell me they're
never useful. They're useful to me.
Back to top
The de facto standard (or lowest common denominator) for HA is X10
technology. It has the disadvantages of being old and clunky and somewhat
unreliable; it has the advantages of being cheap and ubiquitous. It's
essential to support X10.
I consider the Slink-e an essential piece of hardware also (see
www.nirvis.com). It understands the Sony
S-link protocol, but can also act as an IR router even if you don't have Sony
equipment at all. For example, you can have the computer control all of your
AV equipment by talking (serially) to the Slink-e, which then talks (via IR)
to your DVD or TV or whatever. I use mine to control my Sony 300-CD changer.
The freeware CDJ (also at nirvis.com) is perhaps the best piece of Windows
freeware I've ever seen.
There are other hardware options also. Many of these I'm completely ignorant
of. First things first.
Back to top
Events might be triggered any number of ways:
Thinking in terms of "output" rather than just events, there are various
ways the system might present output to the user. I'm being redundant here.
Back to top
Let's use DRb (distributed Ruby) wherever appropriate. This is good for
several reasons:
Obviously we have to support X10. It may be old and clunky, but it's essential.
I've been told there's an X10 daemon in FreeBSD; I'm not opposed to using a
similar existing tool and "wrapping" it for Ruby if it helps. Pure Ruby would
be fast enough for the X10 protocol, however, so speed is not an issue. The
computer typically talks to an interface called a CM11A which then talks X10
over the powerline. (There are some alternatives to the CM11A.)
I like the Slink-e. It talks to Sony hardware and it's compatible with Xantech
IR accessories. There's a Perl module out there if someone wants to port it
(I think it's part of the MisterHouse code). It should also possible to SWIG
the C++ code — I think that's supplied on the nirvis.com site.
For text-to-speech, I hear Festival is pretty good. I've never tried it. I
definitely think we need to interface to some existing engine rather than try
to create our own.
The same is even more true for voice recognition. We couldn't reasonably do
that kind of thing ourselves (in my opinion), especially in pure Ruby.
I think that someone at RubyConf 2002 (which I wasn't able to attend) gave a
presentation on something called Ruby/Snack. I believe this had implications
both for speech synthesis and voice recognition in Ruby. It might be worth
looking into.
As I said, I think that it's good to let some servers be duplicated as needed.
For example, if more than one computer has a sound card, we could address each
of them independently for sound effects and text-to-speech.
Likewise there could be more than one voice recognition server (or more than
one instance of a "middleman" process which takes sound from the mike and
sends it to the single VR server to be recognized). Basically any input or
output point should be "clonable"; we should be able to run a graphical client
on any computer in the house. The exceptions that come to mind are these: There
can reasonably only be one X10 interface and we need only one webserver.
As I said, I like HomeSeer a lot. I have harvested ideas from it and will
continue to do so. Their script library also is a good source of ideas.
But where HomeSeer is scripted primarily in VBscript, we will obviously
be scripting in Ruby. Whatever API we settle on, I'd like to see a variant of
that functions just by sending text messages to sockets. Then a server could
be started up which would interface equally well with any language for which
someone bothered to write an interface. If it's socket-oriented, it could
even be scripted in VBscript from a Windows machine. But don't say I said so.
I remember reading something a few weeks ago about the xAP protocol,
which is an HA-oriented protocol designed to be simple and universal. It's not
XML-based, but I think of it as the XML of the HA field. It doesn't seem to
be mature yet, but it does seem to be a kind of standard, and one we should
perhaps think of supporting. I'm not sure how this would impact our socket-
based API, if at all.
Now: The issue of "other hardware" arises. There are many wild and wonderful
things out there that I've heard of but really know nothing about. There are
Elk magic modules and Ocelots and Audreys and the StarGate and who knows what.
Since I don't know about these things, I can't assess the need to support them.
We'll have to start with the core and proceed.
Back to top
Back to top
Here I've tried to at least provide a "covering" of the functionality. I think
that the list I present here justifies every hardware and software component
in my current design.
The "invisible computer" principle. I'm not stating this as a user
interface issue or anything like that. I'm just stating the idea that, whatever
the computer may do, it should not prevent the functioning of devices
that do not require the computer. Two specific examples are the X10 interface
and the Slink-e device. Even though the computer may receive every X10 command,
there is no reason it should interfere in the functioning of devices that also
hear the same commands and respond automatically. As for the Slink-e and its
handling of infrared, there are IR emitters that allow a standard IR signal
from a remote to "pass through" and go directly to the AV equipment. Whether
the computer receives these signals also (through the receiver on the Slink-e)
is irrelevant.
Remote controls as triggers. The system can receive X10 commands via
the CM11A (or equivalent) and act on them. It can also receive IR commands via
the Slink-e and act on them. Thus X10 remotes and ordinary IR remotes can be
used to trigger macros and scripts.
The scheduler. The system can trigger events automatically via a
scheduler. The scheduler should be sophisticated enough to know such things as
sunrise and sunset for the current locale. It should be smart enough to allow
exceptions to its rules (such as weekends, days off, holidays, etc.). It should
be sensitive to modes such as "at home" and "away from home."
Control of (and by) the phone. The system has the capability (with a
suitable modem) of interacting with the phone line. This opens up many
possibilities:
Internet access. The system will have direct Internet and web access
for purposes of retrieving information such as news, weather, stock quotes,
streaming audio, and so on. It will be able to receive email and optionally
to act on it. It will be able to send email as system alerts, forward incoming
mail, and so on (much as for the telephone).
Web server. The system will have an integrated web server or an
interface to a standard server. All system status (devices and sensors) will be
visible on the web page, and all features will be controllable via this page.
There should perhaps be multiple levels of security, at the least a "guest" or
read-only mode.
Redundant servers. Where it makes sense to allow multiple copies of a
server, it should be possible to do so. I'm assuming the servers will be on
different machines for the purpose of interacting with different pieces of
hardware. I am now thinking that perhaps there should be a "microphone server"
that would serialize requests to a single "voice recognition" server running
on a fast machine.
Audio-video control. This is an avenue I haven't explored much. At the
very least, it should be possible to program the computer to record TV shows
rather than programming the cable box and VCR separately. There are open-source
TiVo clones, so I hear, but I don't know about them. Maybe that would be
something to include. Also we need the capability to handle scenarios like
a voice command of "play CD, Billy Joel, Storm Front."
Palm client. This is a side issue, but there should be a Palm client
and/or PocketPC so that PDA users can give IR commands to the Slink-e and
trigger arbitrary scripts on the system.
And so on. There's more, but some of it I haven't even thought of yet.
For now, I am using Guillaume Pierronet's serial port code (see the RAA
or go directly here for
more information).
For those interested in details of the X10 protocol, I have captured some
text information and HTMLized it: X10 Protocol. Parts of it are confusing. I'd appreciate
any assistance in deciphering it.
Once that works, I'll be interested in interfacing to Festival (TTS) or
Sphinx (VR). For that, I assume I'll be learning SWIG.
All three of these pieces will be wrapped as druby servers. At least, that's
my first thought.
All comments (or help) welcome. If I work on this alone, I'll never finish it.
1. What is home automation?
This is only my definition: The use of the computer as a household
assistant.
2. Some features of HA
This is my own list:
3. Events and triggers
An "event" is any operation the computer performs that directly affects its
environment — an "output," if you will. Some examples are:
4. Technologies and ideas
I have a few sketchy implementation ideas. See also "Usage Scenarios" below.
5. A Picture is Worth 1024 Bytes
Here is my crude drawing of the architecture of the system. Please don't
critique my artwork.
6. Usage Scenarios
Consider this to be a "user story" section, if you will. Some of these differ
in granularity, i.e., some are higher-level and some lower-level. And there are
doubtless many things I haven't thought of. The whole point is to design a
system that is flexible and programmable.
7. Project Plan
At present (March 2003), there is no formal project plan. What I've been
doing so far in the way of coding is to work on a simple native-Ruby
driver for the CM11A. Here is the
latest code,
such as it is.