A Home Automation System
for Linux
in Ruby

Conceived by: Hal Fulton

What is Domo? What is HA? Features of HA Events/triggers Technologies/ideas Diagram Usage Scenarios Project Plan

0. Introduction: What is Domo?

Domo is a piece of pure vaporware at the moment. It is intended to be a distributed, full-featured home automation software system, written in Ruby and running on Linux. Since we don't all agree on what constitutes "home automation," I'll go into detail on this later. See below.

Some questions naturally arise. (This is not exactly a FAQ, since these haven't been asked that much yet.)

Why the name "Domo"? Well, it reminds me of the Japanese phrase domo arigato, meaning "thank you very much." And by association, it even reminds me of the old Styx song that says, "Domo arigato, Mr. Roboto!" And yes, I did think of calling it "Mr. Roboto" or even "Mr. Ruboto"; but in the end, I chose Domo for a totally unrelated reason: It's the Esperanto word for house, chosen for its recognizable kinship with words like domicile and domestic. So now you know.

Why Ruby? The answer is that Ruby is simply the most beautiful, most maintainable language I know of. No flames, please; I don't know LISP or SmallTalk or your favorite language. If you don't know about Ruby (how did you get here?) you can go to the main Ruby site and read more. At any rate, I'm adamant that Domo will be scriptable in Ruby, so writing the bulk of the code in Ruby seems natural to me. (I'm certainly not opposed to exposing an API that can be called from Perl, Python, Java, whatever. A well-documented socket-based API might be good for that.)

Why Linux? I'm tired of Windows and I'm switching to a Linux-only home within the year. Of course, I can think of no reason that it might not run on FreeBSD or various UNIX variants; but I can't target everything, and I'm specifically not targetting Windows.

Why not cross-platform? For one thing, it increases the development effort. For another thing, not all the "pieces" I am thinking of interfacing with are themselves cross-platform. And thirdly, there are so many pieces of "neat" software that are only available on Windows. I'm in favor of tipping that balance a little. Let's create some neat software that just won't run on Windows.

Why not use MisterHouse? Well, MH is fairly mature and has some neat features. But it's Perl-based. I'm not opposed to Perl, and I even thought about a Ruby interface for MH that would at least allow scripting in Ruby. But I feel that sometimes it's better just to start over.

So are you completely reinventing the wheel? Not at all. Some components of the system will be more or less "black boxes"; they'll be given a Rubyesque API and left alone. Many of these need to be written in C for speed, or they are too complex to develop from scratch. Examples are voice recognition and speech synthesis. (Many don't see these as fitting into home automation at all. I think they're important. See below.)

What do you use now? On Windows, I use HomeSeer (www.homeseer.com). Out of the four or five packages I've looked at, it's by far the best. The interface is powerful and flexible, the hardware support is great, the software is very stable, and the online support is excellent (both from the developers and the user community). The library of existing scripts is wonderful, and the API is rich and flexible. The only three negatives: 1) It's Microsoft only. 2) It's not open-source. 3) I can't get it to cooperate with ActiveScriptRuby (though people are successfully scripting it in Perl and Python).

So do you want to "clone" this other piece of software? No, definitely not. For all its good qualities, I think we can do better. For one thing, it is not distributed. You can't, for example, have a client on each computer in your house.

Will this be open source? Yes, definitely. I lean toward the "Ruby license"; but this may be problematic as I may interface with packages licensed differently.

Are you writing this all yourself? Absolutely not. But I have had trouble generating interest in it. If I can in fact generate interest, I'll start a project on Sourceforge or the equivalent.

1. What is home automation?

This is only my definition: The use of the computer as a household assistant.

Some will think this definition is too broad. And I will narrow it later. However, it is narrow enough already that it excludes the household that has some X10 hardware and a few remote controls, but no computer controlling it all. To me as a hacker, the computer is essential. I don't want just to control my house; I want to program my house.

When I discuss some of the features I like in an HA system, many people say they don't really consider those features to be home automation. The prime examples are voice recognition and speech synthesis.

But many people consider these to be very run-of-the-mill features. Go read the comp.home.automation newsgroup, or go there and ask how many of them use voice recognition and text-to-speech features. (Read this group anyway. It's great.)

Maybe you don't think these are part of HA. But I (and many others) do. I like being able to sit on my couch and control things just by talking to the mike. I don't even have to grab the wireless keyboard that goes with the downstairs box. I can say, "too cold in here"; and my computer will respond by bumping up the thermostat and acknowledging it by saying "temperature up."

I distribute the audio from the computer through my whole house with wireless speakers. The computer wakes me up and tells me the time, day of the week, and date. Then it tells me the weather forecast (which it retrieves from the web). Then it reminds me of the things I have on my to-do list that day. That's all before I even get up. I also have the "CNN breaking news" script; it checks the website every 3 minutes and when there's a breaking news item, it plays a WAV file to alert me, and then reads the item.

You're free not to like voice and speech features. But don't tell me they're never useful. They're useful to me.

2. Some features of HA

This is my own list:

control of lights, appliances, etc.
control of thermostat
control of audio-video equipment
answer phone (and place calls) via modem
control by phone
ability to send/receive email
access to web/Internet
info gathering via sensors (motion, temp, humidity, ...)
integrated webserver
control by voice
control by macro, script, or schedule
output via text-to-speech

The de facto standard (or lowest common denominator) for HA is X10 technology. It has the disadvantages of being old and clunky and somewhat unreliable; it has the advantages of being cheap and ubiquitous. It's essential to support X10.

I consider the Slink-e an essential piece of hardware also (see www.nirvis.com). It understands the Sony S-link protocol, but can also act as an IR router even if you don't have Sony equipment at all. For example, you can have the computer control all of your AV equipment by talking (serially) to the Slink-e, which then talks (via IR) to your DVD or TV or whatever. I use mine to control my Sony 300-CD changer. The freeware CDJ (also at nirvis.com) is perhaps the best piece of Windows freeware I've ever seen.

There are other hardware options also. Many of these I'm completely ignorant of. First things first.

3. Events and triggers

An "event" is any operation the computer performs that directly affects its environment — an "output," if you will. Some examples are:

Lights on, off, or dim
Appliances on/off
Thermostat up/down, on/off, etc.
TV and stereo equipment on/off, mute, channel up/down...
text-to-speech announcement
run a macro or script
send an email
make a phone call
display information on the monitor

Events might be triggered any number of ways:

scheduler (one-time, recurring, custom, sunrise/set...)
sensor (motion detectors, etc.)
X10 command received
RF remote (also X10)
IR remote
script control
web control
incoming email
incoming phone call
voice command

Thinking in terms of "output" rather than just events, there are various ways the system might present output to the user. I'm being redundant here.

graphically onscreen
web page update
phone message
email message
sound effect (WAV)
text-to-speech (TTS)
X10 devices

4. Technologies and ideas

I have a few sketchy implementation ideas. See also "Usage Scenarios" below.

Let's use DRb (distributed Ruby) wherever appropriate. This is good for several reasons:

We can have multiple controlling clients if we want.
We can share the load among multiple computers.
We can run multiple copies of some servers (for example, text-to-speech)
Servers not wanted need not be started (e.g., if you're not doing voice recognition, don't run the server).
We get good separation of concerns, both statically and dynamically.

Obviously we have to support X10. It may be old and clunky, but it's essential. I've been told there's an X10 daemon in FreeBSD; I'm not opposed to using a similar existing tool and "wrapping" it for Ruby if it helps. Pure Ruby would be fast enough for the X10 protocol, however, so speed is not an issue. The computer typically talks to an interface called a CM11A which then talks X10 over the powerline. (There are some alternatives to the CM11A.)

I like the Slink-e. It talks to Sony hardware and it's compatible with Xantech IR accessories. There's a Perl module out there if someone wants to port it (I think it's part of the MisterHouse code). It should also possible to SWIG the C++ code — I think that's supplied on the nirvis.com site.

For text-to-speech, I hear Festival is pretty good. I've never tried it. I definitely think we need to interface to some existing engine rather than try to create our own.

The same is even more true for voice recognition. We couldn't reasonably do that kind of thing ourselves (in my opinion), especially in pure Ruby.

I think that someone at RubyConf 2002 (which I wasn't able to attend) gave a presentation on something called Ruby/Snack. I believe this had implications both for speech synthesis and voice recognition in Ruby. It might be worth looking into.

As I said, I think that it's good to let some servers be duplicated as needed. For example, if more than one computer has a sound card, we could address each of them independently for sound effects and text-to-speech.

Likewise there could be more than one voice recognition server (or more than one instance of a "middleman" process which takes sound from the mike and sends it to the single VR server to be recognized). Basically any input or output point should be "clonable"; we should be able to run a graphical client on any computer in the house. The exceptions that come to mind are these: There can reasonably only be one X10 interface and we need only one webserver.

As I said, I like HomeSeer a lot. I have harvested ideas from it and will continue to do so. Their script library also is a good source of ideas.

But where HomeSeer is scripted primarily in VBscript, we will obviously be scripting in Ruby. Whatever API we settle on, I'd like to see a variant of that functions just by sending text messages to sockets. Then a server could be started up which would interface equally well with any language for which someone bothered to write an interface. If it's socket-oriented, it could even be scripted in VBscript from a Windows machine. But don't say I said so.

I remember reading something a few weeks ago about the xAP protocol, which is an HA-oriented protocol designed to be simple and universal. It's not XML-based, but I think of it as the XML of the HA field. It doesn't seem to be mature yet, but it does seem to be a kind of standard, and one we should perhaps think of supporting. I'm not sure how this would impact our socket- based API, if at all.

Now: The issue of "other hardware" arises. There are many wild and wonderful things out there that I've heard of but really know nothing about. There are Elk magic modules and Ocelots and Audreys and the StarGate and who knows what. Since I don't know about these things, I can't assess the need to support them. We'll have to start with the core and proceed.

5. A Picture is Worth 1024 Bytes

Here is my crude drawing of the architecture of the system. Please don't critique my artwork.

6. Usage Scenarios

Consider this to be a "user story" section, if you will. Some of these differ in granularity, i.e., some are higher-level and some lower-level. And there are doubtless many things I haven't thought of. The whole point is to design a system that is flexible and programmable.

Here I've tried to at least provide a "covering" of the functionality. I think that the list I present here justifies every hardware and software component in my current design.

The "invisible computer" principle. I'm not stating this as a user interface issue or anything like that. I'm just stating the idea that, whatever the computer may do, it should not prevent the functioning of devices that do not require the computer. Two specific examples are the X10 interface and the Slink-e device. Even though the computer may receive every X10 command, there is no reason it should interfere in the functioning of devices that also hear the same commands and respond automatically. As for the Slink-e and its handling of infrared, there are IR emitters that allow a standard IR signal from a remote to "pass through" and go directly to the AV equipment. Whether the computer receives these signals also (through the receiver on the Slink-e) is irrelevant.

Remote controls as triggers. The system can receive X10 commands via the CM11A (or equivalent) and act on them. It can also receive IR commands via the Slink-e and act on them. Thus X10 remotes and ordinary IR remotes can be used to trigger macros and scripts.

The scheduler. The system can trigger events automatically via a scheduler. The scheduler should be sophisticated enough to know such things as sunrise and sunset for the current locale. It should be smart enough to allow exceptions to its rules (such as weekends, days off, holidays, etc.). It should be sensitive to modes such as "at home" and "away from home."

Control of (and by) the phone. The system has the capability (with a suitable modem) of interacting with the phone line. This opens up many possibilities:

Announce Caller ID information aloud and/or display onscreen
Act as an answering machine with infinite storage and logging capability
Deliver different outgoing messages based on Caller ID information
Dial a number and deliver a message
Dial a numeric or alphanumeric pager and send a message
Forward a recorded incoming message to user at another number
Accept incoming call and allow control by voice or touchtone (with proper security)

Internet access. The system will have direct Internet and web access for purposes of retrieving information such as news, weather, stock quotes, streaming audio, and so on. It will be able to receive email and optionally to act on it. It will be able to send email as system alerts, forward incoming mail, and so on (much as for the telephone).

Web server. The system will have an integrated web server or an interface to a standard server. All system status (devices and sensors) will be visible on the web page, and all features will be controllable via this page. There should perhaps be multiple levels of security, at the least a "guest" or read-only mode.

Redundant servers. Where it makes sense to allow multiple copies of a server, it should be possible to do so. I'm assuming the servers will be on different machines for the purpose of interacting with different pieces of hardware. I am now thinking that perhaps there should be a "microphone server" that would serialize requests to a single "voice recognition" server running on a fast machine.

Audio-video control. This is an avenue I haven't explored much. At the very least, it should be possible to program the computer to record TV shows rather than programming the cable box and VCR separately. There are open-source TiVo clones, so I hear, but I don't know about them. Maybe that would be something to include. Also we need the capability to handle scenarios like a voice command of "play CD, Billy Joel, Storm Front."

Palm client. This is a side issue, but there should be a Palm client and/or PocketPC so that PDA users can give IR commands to the Slink-e and trigger arbitrary scripts on the system.

And so on. There's more, but some of it I haven't even thought of yet.

7. Project Plan

At present (March 2003), there is no formal project plan. What I've been doing so far in the way of coding is to work on a simple native-Ruby driver for the CM11A. Here is the latest code, such as it is.

For now, I am using Guillaume Pierronet's serial port code (see the RAA or go directly here for more information).

For those interested in details of the X10 protocol, I have captured some text information and HTMLized it: X10 Protocol. Parts of it are confusing. I'd appreciate any assistance in deciphering it.

Once that works, I'll be interested in interfacing to Festival (TTS) or Sphinx (VR). For that, I assume I'll be learning SWIG.

All three of these pieces will be wrapped as druby servers. At least, that's my first thought.

All comments (or help) welcome. If I work on this alone, I'll never finish it.

hits