Getting inside Cloud Foundry for debug (and profit?)

I’ve recently started to play with some more of the internals of Cloud Foundry than I’ve been used to. This has been made much easier by the advent of bosh-lite, a system for deploying all of Cloud Foundry’s components using the bosh continuous deployment and configuration tool, into a single virtual machine. bosh-lite achieves this by using containers (Cloud Foundry’s own Warden container technology) to “emulate” the individual VMs where jobs would run in a full distributed topology.

bosh-lite has actually been around for a number of months now, but I’ve not had much of a chance to play with it until recently. This is partly due to other activities, and also that my earlier attempts to get an environment up-and-running were hampered by lack of memory. It should be possible to run bosh-lite with a Cloud Foundry deployment in 8Gb of RAM, but given my laptop’s configuration and the amount of other stuff I’m usually running, that was never comfortable – now I’m rocking 16Gb in a MacBook Pro, things are running more smoothly.

I don’t intend to spend this post documenting how to install bosh-lite and get a running single-node Cloud Foundry system. I followed the instructions in the README and things went well on this occasion. One suggestion that I’d make is if you can, to use VMware Fusion (assuming like me, you’re on OS X) and the Vagrant provider for Fusion, seems quite a lot better than Virtualbox. If you do, don’t forget to pass the --provider=vmware_fusion flag when you bring your Vagrant image up (that’s something I do usually forget). One other little thing to mention is that after I started the bosh deployment, the bosh CLI gem timed out and returned a REST error – but the deployment process itself continued without any issues, and I was able to use bosh tasks to check in on the progress. If you are interested, I used cf-release-157 this time around.

Once I had my minty-fresh Cloud Foundry running, I deployed Matt Stine’s handy, simple, Ruby scale demo app and pushed up the number of instances.

So what’s the point of this post? I want to mention two things…

Note: this is not about debugging applications on Cloud Foundry in general – a PaaS is an opinionated system and you generally shouldn’t need to poke around inside it like this. This is for debugging the Cloud Foundry runtime itself, or aspects that might run inside a container. Oh, and I’m sorry about the formatting of some of the shell output examples below!

Peeking at NATS traffic

NATS is the internal, lightweight message bus that Cloud Foundry components use to talk to one another. I’d read blog posts from Cornelia and from Dr Nic about digging into this before.

First of all, I used bosh ssh to access the NATS host:

$ bosh ssh
1. ha_proxy_z1/0
2. nats_z1/0
3. postgres_z1/0
4. uaa_z1/0
5. login_z1/0
6. api_z1/0
7. clock_global/0
8. api_worker_z1/0
9. etcd_leader_z1/0
10. hm9000_z1/0
11. runner_z1/0
12. loggregator_z1/0
13. loggregator_trafficcontroller_z1/0
14. router_z1/0
Choose an instance: 2
Enter password (use it to sudo on remote host): ***
Target deployment is `cf-warden'

Setting up ssh artifacts

Director task 9

Task 9 done
Starting interactive shell on job nats_z1/0

So now I’m on the NATS host – now what? well, strictly speaking I didn’t need to login to that host / container, since of course, as a messaging system, the other hosts can connect to it anyway. The reason I wanted to login to it was to find out how NATS was configured.

$ ps -ef | grep nats
root 1470 1 0 12:09 ? 00:00:12 /var/vcap/packages/gnatsd/bin/gnatsd -V -D -c /var/vcap/jobs/nats/config/nats.conf

$ more /var/vcap/jobs/nats/config/nats.conf

net: ""
port: 4222

pid_file: "/var/vcap/sys/run/nats/nats.pid"
log_file: "/var/vcap/sys/log/nats/nats.log"

authorization {
user: "nats"
password: "nats"
timeout: 15

cluster {
host: ""
port: 4223

authorization {
user: "nats"
password: "nats"
timeout: 15

routes = [


From this, I can see that NATS is listening on IP, port 4222 (the NATS default), and that it is configured for username/password authentication. Handy to know!

I borrowed a little script from Dr Nic, but needed to modify it slightly to talk to authenticated NATS (his original script assumed there was no auth in place):

[update – Dr Nic has provided a more convenient method to do this, in the comments below – check out nats-sub – but this works, as well]

$ ./nats-all.sh
Msg received on [router.register] : '{"host":"","port":8080,"uris":["login."],"tags":{"component":"login"},"index":0,"private_instance_id":"e6194fe8-4910-4cb1-9f7c-d5ee7ff3f36b"}'
Msg received on [router.register] : '{"host":"","port":8080,"uris":["uaa."],"tags":{"component":"uaa"},"index":0,"private_instance_id":"7713dd5b-3613-41a6-9c67-c48f22a769b4"}'
Msg received on [router.register] : '{"dea":"0-1ba3459ea4cd406db833c1d188a78c02","app":"b8550851-37a0-4bd5-bdce-1d787b087887","uris":["andyp."],"host":"","port":61021,"tags":{"component":"dea-0"},"private_instance_id":"b52dfd91d68144cabb14b6c7bae77daae8b493acf1354c99941d49772a1f61fb"}'
Msg received on [router.register] : '{"dea":"0-1ba3459ea4cd406db833c1d188a78c02","app":"b8550851-37a0-4bd5-bdce-1d787b087887","uris":["andyp."],"host":"","port":61025,"tags":{"component":"dea-0"},"private_instance_id":"090f5c5aeee94fdfb4a4e0f0afde2553480dcd97c018431db37b4dffdc80fde4"}'
Msg received on [router.register] : '{"dea":"0-1ba3459ea4cd406db833c1d188a78c02","app":"b8550851-37a0-4bd5-bdce-1d787b087887","uris":["andyp."],"host":"","port":61028,"tags":{"component":"dea-0"},"private_instance_id":"92e10af77b274836a3f54373c9b7feee025c5b72f41a4c4982bde97d241ebd5b"}'
Msg received on [router.register] : '{"dea":"0-1ba3459ea4cd406db833c1d188a78c02","app":"b8550851-37a0-4bd5-bdce-1d787b087887","uris":["andyp."],"host":"","port":61039,"tags":{"component":"dea-0"},"private_instance_id":"86edf0c0a7f84f04b52693b489ad93b7f857f77271b84d568d8f5600b34f7054"}'
Msg received on [router.register] : '{"host":"","port":34567,"uris":["8b24c0a7d28f4e03aa028a3dc89fb8c3."],"tags":{"component":"directory-server-0"}}'
Msg received on [dea.advertise] : '{"id":"0-1ba3459ea4cd406db833c1d188a78c02","stacks":["lucid64"],"available_memory":23296,"available_disk":22528,"app_id_to_count":{"b8550851-37a0-4bd5-bdce-1d787b087887":10},"placement_properties":{"zone":"default"}}'
Msg received on [staging.advertise] : '{"id":"0-1ba3459ea4cd406db833c1d188a78c02","stacks":["lucid64"],"available_memory":23296}'
Msg received on [dea.heartbeat] : '{"droplets":[{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"d92d3c0c43ce4b6981e443e5c2064580","index":0,"state":"RUNNING","state_timestamp":1392639135.9526377},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"898e632697e246de9cf6b7330444227c","index":1,"state":"RUNNING","state_timestamp":1392639136.3117783},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"56d023e374aa49d88720daabac58e862","index":2,"state":"RUNNING","state_timestamp":1392639135.2225387},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"f11d86a7f4ad47f1ad554ae1b087d5f6","index":3,"state":"RUNNING","state_timestamp":1392639136.1042},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"c9e6de77f0484e6cae47f73ad6ca778a","index":4,"state":"RUNNING","state_timestamp":1392639135.9426212},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"924c387fc33444289b2db2762eefac42","index":5,"state":"RUNNING","state_timestamp":1392639135.940636},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"69866b260b1a49a09c03e178c4add2c5","index":6,"state":"RUNNING","state_timestamp":1392639135.944143},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"94bc605505d94dc1832e55bf2f671a99","index":7,"state":"RUNNING","state_timestamp":1392639135.4456258},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"8420df9bbe64456385dfa91285641ba4","index":8,"state":"RUNNING","state_timestamp":1392639135.9456131},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"ed9ad14f6599494c96f90296c59e6041","index":9,"state":"RUNNING","state_timestamp":1392639135.938359}],"dea":"0-1ba3459ea4cd406db833c1d188a78c02"}'
Msg received on [router.register] : '{"host":"","port":8080,"uris":["loggregator."]}'
Msg received on [router.register] : '{"host":"","port":9022,"uris":["api."],"tags":{"component":"CloudController"},"index":0,"private_instance_id":null}'
Msg received on [router.register] : '{"host":"","port":8080,"uris":["login."],"tags":{"component":"login"},"index":0,"private_instance_id":"e6194fe8-4910-4cb1-9f7c-d5ee7ff3f36b"}'

Warden containers and shells

Cloud Foundry’s native container technology is called Warden. When an application is deployed, Cloud Foundry starts up a Warden container based on the limits assigned in terms of memory etc, and the applications run inside that. How can you get “inside” the container to see what is going on?

Well, there are a couple of techniques. Cloud Foundry Loggregator provides streaming access to the standard application logs (stdout/stderr) via the cf logs command. Another option is James Bayer’s cool websocket-based method for getting access to the container. Yet another option is Warden’s own shell, wsh. This does assume you can access the DEA machine with ssh, however.

wsh doesn’t seem to be very well documented, although I knew Cornelia had played around with it – see her excellent blog post on troubleshooting CF and applications, including a great flowchart / graphic suggesting different techniques.

Here’s the secret sauce:

1. Login to the DEA VM (called “runner_z1/0” in the list provided by bosh ssh).

2. Identify your Warden container… there are a lot showing below, but I happen to know that these are several instances of the same app. The important part is the instance-17ij46hadt2 – the second part or that value maps to the location of the container’s private space on disk.

$ ps -ef | grep warden
root        49    42  1 11:41 ?        00:00:41 /var/vcap/bosh/bin/ruby /var/vcap/bosh/bin/bosh_agent -c -I warden -P ubuntu
root      5390 32634  0 12:12 ?        00:00:00 /var/vcap/data/packages/warden/38.1/warden/src/oom/oom /tmp/warden/cgroup/memory/instance-17ij46hadss
root      5503 32634  0 12:12 ?        00:00:00 /var/vcap/data/packages/warden/38.1/warden/src/oom/oom /tmp/warden/cgroup/memory/instance-17ij46hadsu
root      5697 32634  0 12:12 ?        00:00:00 /var/vcap/data/packages/warden/38.1/warden/src/oom/oom /tmp/warden/cgroup/memory/instance-17ij46hadt3
root      6779 32634  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadsu/bin/iomux-spawn /var/vcap/data/warden/depot/17ij46hadsu/jobs/58 /var/vcap/data/warden/depot/17ij46hadsu/bin/wsh --socket /var/vcap/data/warden/depot/17ij46hadsu/run/wshd.sock --user vcap /bin/bash
root      6780  6779  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadsu/bin/wsh --socket /var/vcap/data/warden/depot/17ij46hadsu/run/wshd.sock --user vcap /bin/bash
root      6784 32634  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadsu/bin/iomux-link -w /var/vcap/data/warden/depot/17ij46hadsu/jobs/58/cursors /var/vcap/data/warden/depot/17ij46hadsu/jobs/58
root      6930 32634  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadss/bin/iomux-spawn /var/vcap/data/warden/depot/17ij46hadss/jobs/59 /var/vcap/data/warden/depot/17ij46hadss/bin/wsh --socket /var/vcap/data/warden/depot/17ij46hadss/run/wshd.sock --user vcap /bin/bash
root      6931  6930  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadss/bin/wsh --socket /var/vcap/data/warden/depot/17ij46hadss/run/wshd.sock --user vcap /bin/bash
root      6934 32634  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadss/bin/iomux-link -w /var/vcap/data/warden/depot/17ij46hadss/jobs/59/cursors /var/vcap/data/warden/depot/17ij46hadss/jobs/59
root      6950 32634  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadt3/bin/iomux-spawn /var/vcap/data/warden/depot/17ij46hadt3/jobs/60 /var/vcap/data/warden/depot/17ij46hadt3/bin/wsh --socket /var/vcap/data/warden/depot/17ij46hadt3/run/wshd.sock --user vcap /bin/bash
root      6955  6950  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadt3/bin/wsh --socket /var/vcap/data/warden/depot/17ij46hadt3/run/wshd.sock --user vcap /bin/bash
root      6960 32634  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadt3/bin/iomux-link -w /var/vcap/data/warden/depot/17ij46hadt3/jobs/60/cursors /var/vcap/data/warden/depot/17ij46hadt3/jobs/60
vcap     23713 16807  0 12:26 pts/0    00:00:00 grep --color=auto warden
root     32634     1  0 11:52 ?        00:00:09 ruby /var/vcap/data/packages/warden/38.1/warden/vendor/bundle/ruby/1.9.1/bin/rake warden:start[/var/vcap/jobs/dea_next/config/warden.yml]

3. Head over to the directory for your chosen Warden instance:

$ cd /var/vcap/data/warden/depot/17ij46hadt2

4. Notice that the Warden containers are running as root. If you run wsh now as an unprivileged user, you’ll get a connect: Permission denied error. Time to switch to root, and then run wsh specifying the command to run inside the shell, as a parameter:

$ sudo su -
# cd /var/vcap/data/warden/depot/17ij46hadt2
# bin/wsh /bin/bash

5. At this point, we’re inside the Warden container with a bash shell, and all commands are scoped inside it. So, let’s take a look at what is running:

root@17ij46hadt2:~# ps -ef
root         1     0  0 12:12 ?        00:00:00 wshd: 17ij46hadt2
vcap        29     1  0 12:12 ?        00:00:00 /bin/bash
vcap        31    29  0 12:12 ?        00:00:00 ruby /home/vcap/app/vendor/bundle/ruby/1.9.1/bin/rackup config.ru -p 61031
vcap        32    31  0 12:12 ?        00:00:00 /bin/bash
vcap        33    31  0 12:12 ?        00:00:00 /bin/bash
vcap        34    32  0 12:12 ?        00:00:00 tee /home/vcap/logs/stdout.log
vcap        35    33  0 12:12 ?        00:00:00 tee /home/vcap/logs/stderr.log
root        39     1  0 12:27 pts/0    00:00:00 /bin/bash
root        52    39  0 12:27 pts/0    00:00:00 ps -ef

This is our Ruby app, running on port 61031, and we can see the logs being written as well.

Hopefully this is useful information for folks wanting to dig around inside bosh-lite and a running Cloud Foundry system!


WebSphere MQ 7.1 is out – here’s why it is cool…

I’ve been fairly quiet about the latest software from the Hursley lab here on my blog – although, over the past few weeks since the announcements back at the start of October during the European WebSphere Technical Conference, I’ve definitely been speaking about WebSphere MQ v7.1 and WebSphere Message Broker v8.0 – two exciting product releases.

I’m going to spend this post talking about WMQ 7.1, which became available in electronic download form for the distributed platforms last Friday (z/OS will follow shortly). I’ll return to talk about all the (รผber)-coolness in Message Broker a little closer to the release date for that product.

So what is the big deal in this release?

It brings parallel / multi-version install

From version 7.1 onwards, there is now the capability to install more than one copy of WMQ on a system, for Windows and UNIX platforms. This includes installing alongside WMQ v7.0.1.6 (fixpack 6 on v7.0.1, the minimum level for multi-version install to work) – you can have one copy of v7.0.1.6, and multiple copies of 7.1, for example – and future versions will also be able to be installed in parallel, should the need arise. This should make migration and testing simpler. Applications can now point to their “own” install of WMQ if required. The GSKit installation, which provides some of the security functions for the queue manager, now gets installed “inside” the main installation as well, to make the whole thing more self-contained, and potentially easier to embed into other solutions if needed.

Here’s a teaser image from a Windows system that my colleague “mqjeff” sent me earlier today ๐Ÿ™‚ he has and 7.1 on the same machine.

It’s (even more) secure

WebSphere MQ has always had a number of strong security capabilities, including SSL for channel authentication and encryption, and fine-grained access control of queue manager objects via the Object Authority Manager. It has also been possible to add transparent, per-message / per-queue / per-policy on-disk encryption and signing of message data via the Advanced Message Security feature. In v7.1, a renewed focus on end-to-end security adds the ability to authorise on a per-IP/user connection basis, as well as adding more crypto algorithms and additional authorisation options, and making much more of that security function available via the MQSC administration tool. T-Rob has a much more complete post about these changes so I won’t go into any more detail here.

It runs better, on bigger systems

Bigger systems… like the z196 mainframes? Well, that’s one example, yes, but WMQ v7.1 has been more optimised for big and multicore systems in general. On the mainframe, there are a bunch of great enhancements such as increased resilience in dealing with shared queues in a coupling facility, and the introduction of Shared Message Data Sets (SMDS) to significantly improve performance there as well. Let’s just say that the performance numbers for z/OS are looking really, really good… which brings me on to…

It continues to push the performance envelope

A major focus on performance in the v7.1 cycle has produced some fantastic results, and when the performance reports appear (as SupportPacs, within the next few weeks), you’ll see the “fastest WMQ ever”. This theme runs throughout everything: not just the base runtime messaging, but also things like making the WMQ Explorer tooling significantly snappier to operate as well (oh, and that’s now 60% smaller, and more sleek!)

There is also a new option for publish/subscribe applications – the ability to publish on a topic via multicast. This re-uses some of the technology from the WebSphere MQ Low Latency product so that it can run very fast. After the initial application startup, it means that applications can also operate when the queue manager is not available.

It adds Telemetry to the base install

No surprise that I’d highlight this one (it is also an important part of the overall story, per the next heading!) – I’ve been talking about the IBM implementation of MQTT, the open protocol which is being standardised and which it was just-announced will be part of the Eclipse Paho M2M project, for the past couple of years.

In WMQ v7.1, there is no longer a separate installation to run in order to add this support. On the platforms where the Telemetry feature is supported – Windows, Linux IA64, and (new in v7.1) AIX – this is now an optional part of the base installation. That means it is very easy to try out. Oh, and as well as being integrated with WMQ Explorer, the full range of Telemetry objects can now also be administered via the MQSC command line.

It brings the family together

This is a big one, in my opinion. I’ve mentioned that WMQ “base” can now interoperate with WMQLLM via the multicast publish-and-subscribe support; and the WMQ Telemetry functionality is “in the box” as part of the installer on the relevant platforms.

Why do these things that matter? Well, as I mentioned in my recent MQTT FAQ, something that IBM has observed over a number of years of building and delivering production-ready messaging middleware is that one size does not fit all. There’s the fundamental transactional messaging backbone (WMQ base) which needs to be solid, reliable, and easy to administer through comprehensive scripted and graphical tools… but beyond that, there are some additional qualities of service that need to be considered. There’s the very high speed, low latency use case which may be very specialised (WMQLLM), and there’s the need to deal with small and constrained devices and less-reliable networks (WMQ Telemetry / MQTT). Of course, you may also want to perform file transfer over that infrastructure (WMQ File Transfer Edition), secure your messaging (WMQ AMS), or route and transform your data and connect with “foreign” systems via different protocols (WebSphere Message Broker). I’ve been talking about this as part of IBM’s Messaging Vision for a number of years and it is really showing through in this release of WebSphere MQ. It’s a complete story.

It addresses many “papercuts”

On top of all of that… the team has really tried to address many of the common papercut issues, by which I mean the gotchas, annoyances, and the “wouldn’t it be so much better if….”s. Things like, gosh, I wish I knew what version of WMQ that client is using to connect to me? (yep, you can find out now). ย How about “bind on group” for messages in a cluster? The ability to backup / dump and restore the configuration of a queue manager without needing to use a SupportPac? There’s a real sense of “fit and finish”, and I believe that shows that the development team have been listening to feedback and making the tweaks that users have been asking for where possible.

So – all-in-all, there’s a lot in this release that makes it worth a look, either from the perspective of users who are looking at an upgrade to gain performance, security and usability benefits; or for those looking for a solid, dependable messaging platform which can support modern applications. There’s a lot of excitement and innovation going on in the “traditional Message Oriented Middleware” space at the moment and WMQ and the related protocols like MQTT are right at the heart of those trends.

To learn more about the features I’ve talked about, and some that I haven’t, check out the online Infocenter. You can also check out the “What’s New in WMQ v7.1” presentation from the WebSphere Technical Conference, via T-Rob’s blog.

MQTT goes free – a personal Q&A

There has been a lot of coverage over the past couple of days of some exciting announcements that I’ve been involved with at work. I’ve spent the past three days at EclipseCon Europe 2011, which doubled as the 10th birthday celebration for the Eclipse initiative. It was a funny feeling, because Eclipse started just a few weeks after I first joined IBM, and although I’ve used it and watch it “grow up”, I’ve never done EclipseCon before. The reason I’ve been out there for three days this time (as a WebSphere Messaging guy rather than a Rational tooling or build person, for example) was to get involved with activities around these announcements.

It’s all about machine-to-machine (or M2M) communications, Smarter Planet, and the Internet of Things.

Before I dive in to this, a few clarifications. First, I’m being described in a couple of news stories as “an IBM distinguished engineer”, and whilst I wish that was true, I’ve yet to ascend to those heights! Also, there are various numbers being quoted – note that the figures in the press release were not invented by IBM, the headline number of an expected 50 billion connected devices by 2020 comes from a recent study conducted by Ericsson AB. Oh, and this isn’t about a “new” protocol – MQTT has been in use since 1999.

The other clarification is that some articles seem to suggest that IBM is out to create some kind of new, alternative, Web – that’s not what has been announced, and I’m certainly not aware of any such plan! It’s about connecting “things” – sensors, mobile devices, embedded systems, even small appliances or medical devices for example – to the Web and the associated platform and ecosystem of technologies, not about reinventing or recreating them. I’m personally a huge fan of the Web as a platform ๐Ÿ™‚

Oh, and of course, the obligatory “all opinions expressed are my own” – this is my understanding of where things are going, although of course I’m talking about events I’m directly involved in!

So what is this all about?

Two things.

1. On Nov 2, IBM, Eurotech, Sierra Wireless and Eclipse formed a new M2M Industry Working Group at Eclipse. Sierra had already started the “Koneki” project at Eclipse to work on M2M tools, and the Working Group will look at a range of topics together, such as M2M tooling, software components, open communication and messaging protocols, data formats, and APIs.

2. On Nov 3, IBM and Eurotech announced the donation of their C and Java clients for MQTT to a new Eclipse project called “Paho” which is under proposal in the incubator – with code expected to hit the repository within the next couple of months. MQTT is being given to Eclipse to live within the M2M ecosystem that is emerging there, and to provide an avenue for adoption of the protocol as a more pervasive standard for connected devices.

How is that news? Isn’t MQTT already open / free?

Technically… kinda, sorta ๐Ÿ™‚

The MQTT specification has been published under a royalty-free license for some time, and that has led to a fantastic community contributing a range of different projects. IBM and Eurotech took this approach from early on, because it wouldn’t have been possible to compile and support code on every embedded platform that might come along – far simpler to set the protocol free.

Initially the specification was hidden away in the WebSphere Message Broker documentation, but last year it was republished, moved to a new home on developerWorks, and the license was clarified.

In August, IBM and Eurotech announced their intention to take MQTT to a standards organisation. The specific organisation has not yet been finalised, but this is also an important step in ensuring that MQTT is not “just” an IBM protocol, but something of general use which the community can feel comfortable with. If you’d like to join that discussion then there’s a Get Involved page on the mqtt.org community site.

The missing piece was code – a reference implementation, if you like. That’s one reason why the Eclipse Paho announcement is significant.

Why else is this significant?

Well, here are some of my musings on that one:

  • it shows IBM is serious, by committing code and open sourcing it (as with the original Eclipse donation in 2001);
  • the M2M Industry Working Group exists to foster the discussion in this space;
  • it makes high-quality reference Java and C client implementations freely available in source form, with a good Java implementation something that has been particularly lacking;
  • it creates an opportunity for Eclipse projects to use MQTT, and to develop tools on top of it.

The press release and Paho project proposals aren’t clear (to me) – what exactly is being donated?

IBM is seeding Eclipse Paho with C and Java client implementations of MQTT. Eurotech is donating a framework and sample applications which device and client developers can use when integrating and testing messaging components.

Why C and Java clients (aren’t they “dying” languages?) Where’s my Perl and Ruby code?!

IBM had previously made some C and Java code available in some SupportPacs, but those are outdated and the license for reuse was never clear.

It’s important to realise that this stuff came from the embedded world of 10 (and more) years ago, and continues to be applied in that industrial space. That category of device typically runs some kind of realtime Java-based OS, or a Linux-based or other runtime with a GCC toolchain for the CPU in question. C and Java are genuinely the most useful implementations to get out there. Oh, and on that “those old languages” thing – I think you’ll find they are very widely used (Android, iOS etc run variants of sorts, most non-web app development is likely to be in one or the other).

We’re very fortunate that clients libraries for a wide range of languages already exist thanks to the MQTT community – see the list at mqtt.org!

Hold on… don’t we need a broker / server / gateway?

Yes. But, one step at a time! ๐Ÿ™‚

There are brokers available for free today, either as precompiled binaries or as full Open Source implementations, so this is not a dead end from day one.

The Paho project scope outlines the intention to add a broker to the project in the future, and to host an M2M sandbox for developers as well. That is where we are today, and this position will evolve over time.

Why Eclipse?

10 years of Eclipse The Eclipse Foundation has been a fantastic success story (oh, and, Happy 10th Birthday, Eclipse!). As the scope of their mission has broadened beyond an IDE to the web, build environments, and all kinds of other tools, it was a good place for Sierra Wireless to kick off the Eclipse Koneki M2M tools project, and is now a natural place for this primarily M2M protocol to be hosted under Paho. As James Governor notes in his write-up of the news:

… the Eclipse Public License is designed to support derivative works and embedding, while the Eclipse Foundation can provide the stewardship of same. One of the main reasons Eclipse has been so successful is that rather than separate software from specification it brings them together โ€“ in freely available open source code โ€“ while still allowing for proprietary extensions which vendors can sell.

How quickly will the code donation happen?

The Paho proposal tentatively includes dates in November and December 2011 – there will need to be various approvals as code is accepted into Eclipse, so that may “flex” a little, but it is all in the pipeline.

OK… Why MQTT? Why not HTTP/XMPP/AMQP/PubSubHubbub/WebSockets/etcetcetc?

To answer this one adequately I’d probably end up addressing each individual difference between protocols in turn, and if you’ve heard me speak about MQTT I’ve covered some of this before – so I’ll keep this answer relatively brief. I will admit that I’ve been asked about all of these by journalists in the past couple of days.

There is space for a range of protocols to coexist, because they address different areas. In the messaging space, we’ve found over time that whilst efforts to create a single protocol have been made, that has often ended up as focused around a particular set of qualities of service, and not optimised to cover the the whole range of them.

For example, if we look at IBM’s own messaging protocols – there are several. There’s WebSphere MQ which is all about reliable, transactional, solid, clusterable, enterprise, JMS and other APIs, etc etc.. WMQ itself isn’t ideal for very high-speed in-memory or multicast scenarios, so there is also WMQ Low Latency (interoperable with the new multicast feature in WMQ 7.1, but a separate protocol). Neither WMQ LLM or WMQ scales down to unreliable device networks and embedded systems, so there is WMQ Telemetry (aka MQTT), which was specifically designed for constrained devices and networks, and that can interoperate with the main queue manager, too. Oh, and sometimes you want to deal with files (WMQ File Transfer Edition), or access message data via HTTP (WMQ HTTP Bridge). You need to address a range of requirements in a messaging story.

So why not those others? In this case, IBM believes that MQTT is ideally-suited to the Smarter Planet Instrumented->Interconnected layer – it’s tiny, not synchronous and brittle, isn’t specific to the web as it is all about data rather than documents, XML etc etc. In these scenarios, REST principles may add an overhead. Oh, and it has been around for over 10 years, and has been proven across a range of industries and in a range of extreme conditions. IBM’s commercial implementation is known to scale to hundreds of thousands of connected devices, and we know that is the direction that this space is heading.

Congratulations! / Thank you!

Thanks, but don’t congratulate or thank me! I’m familiar with this stuff, I’ve coded with this stuff, but I didn’t invent it and I didn’t write it. There are some amazing folks at both IBM and Eurotech (and some who have moved on) who started this all off in 1999, and who have helped to implement solutions using this protocol since then, and who have of course developed it. Several of them are on Twitter if you want to say hi! And huge thanks again to the community of folks that formed around mqtt.org and contributed client and server implementations – that absolutely helped to move things forward to this point.


That, hopefully helps to clarify a few things and answers some of the questions I’ve seen via Twitter, forums, and mailing lists over the past few days. It has been something of a blur, to be honest, but a lot of fun. I’m looking forward to the next stage – working with the community more, working with our friends at Eurotech, Sierra Wireless and elsewhere, and making the M2M space much more real.

For more, here are a bunch of stories I’ve seen in the past couple of days… no particular order, just my cut-and-paste list!

European WebSphere Technical Conference 2011

Although I realise that it seems as though I do little other than spin around “the conference circuit” at the moment what with the various events I’ve blogged about lately, that isn’t entirely true! However, it is just about time for another European WebSphere Technical Conference – something like a cut-down IMPACT run in Europe, a combination of the popular WebSphere and Transaction & Messing conferences we used to run – with plenty of technical content on the latest technologies.

I’ll be in Berlin next week 10th-14th October, participating in at least one panel, speaking about MQTT, and also covering the latest on IBM MQ messaging technologies as they relate to cloud and web. There’s a Lanyrd event page where I’ll try to collate information relating to the individual talks.b

I have a feeling that by this time next week there could be quite a lot to talk about… ๐Ÿ™‚

What a week for MQTT!

Part of my role as WebSphere Messaging Community Lead involves IBM’s MQ Telemetry Transport protocol. I spend a chunk of my time talking about how MQTT relates to building a Smarter Planet, and explaining how it can be used to build some very cool new applications and solutions.

MQTT logoFolks from IBM and Eurotech may have jointly authored MQTT, but it has been published online with terms enabling royalty-free use and implementation of the protocol. The next stage is to put it forward for standardisation. Last Friday, the call for participation in a standards discussion was published on mqtt.org. It’s open to anyone to join, and given the excitement I’ve personally seen in the developer community, I’m hopeful that we’ll see plenty of interest.

Friday saw even more big news, from an entirely unexpected source. As I stood chatting to people arriving at the OggCamp party that evening, my Twitter alerts and email went crazy with MQTT chatter… Facebook announced that their new Facebook Messenger application (a result of their acquisition of the Beluga team earlier in the year) uses MQTT! I’d been aware of different mobile app developers using MQTT for a while now – in fact we recently highlighted what a great match the protocol is for Android applications, on the mqtt.org blog – but had not known about Facebook’s interest or usage. In their post talking about how Facebook Messenger works, they call out the characteristics that make it a strong protocol for a mobile group messaging application – low bandwidth, low overheads, low power cost… all of the things that have made MQTT successful in sensor networks and solutions, make it ideal for these kind of applications as well.

Well… as I said, a big week, with some exciting news. So it seemed only right that I should give a talk about MQTT and all of these latest developments at OggCamp this past weekend – the event which three years ago, resulted in Roger Light creating his mosquitto broker.

You may recognise the slides as a remix of the talk I gave at LinuxConf in January, but I’ve updated them to highlight the OggCamp dimension and to talk about the recent news. There will be more to come during the coming weeks, so join the chat in channel #mqtt on Freenode IRC, and keep an eye on mqtt.org!