Getting inside Cloud Foundry for debug (and profit?)

I’ve recently started to play with some more of the internals of Cloud Foundry than I’ve been used to. This has been made much easier by the advent of bosh-lite, a system for deploying all of Cloud Foundry’s components using the bosh continuous deployment and configuration tool, into a single virtual machine. bosh-lite achieves this by using containers (Cloud Foundry’s own Warden container technology) to “emulate” the individual VMs where jobs would run in a full distributed topology.

bosh-lite has actually been around for a number of months now, but I’ve not had much of a chance to play with it until recently. This is partly due to other activities, and also that my earlier attempts to get an environment up-and-running were hampered by lack of memory. It should be possible to run bosh-lite with a Cloud Foundry deployment in 8Gb of RAM, but given my laptop’s configuration and the amount of other stuff I’m usually running, that was never comfortable – now I’m rocking 16Gb in a MacBook Pro, things are running more smoothly.

I don’t intend to spend this post documenting how to install bosh-lite and get a running single-node Cloud Foundry system. I followed the instructions in the README and things went well on this occasion. One suggestion that I’d make is if you can, to use VMware Fusion (assuming like me, you’re on OS X) and the Vagrant provider for Fusion, seems quite a lot better than Virtualbox. If you do, don’t forget to pass the --provider=vmware_fusion flag when you bring your Vagrant image up (that’s something I do usually forget). One other little thing to mention is that after I started the bosh deployment, the bosh CLI gem timed out and returned a REST error – but the deployment process itself continued without any issues, and I was able to use bosh tasks to check in on the progress. If you are interested, I used cf-release-157 this time around.

Once I had my minty-fresh Cloud Foundry running, I deployed Matt Stine’s handy, simple, Ruby scale demo app and pushed up the number of instances.

So what’s the point of this post? I want to mention two things…

Note: this is not about debugging applications on Cloud Foundry in general – a PaaS is an opinionated system and you generally shouldn’t need to poke around inside it like this. This is for debugging the Cloud Foundry runtime itself, or aspects that might run inside a container. Oh, and I’m sorry about the formatting of some of the shell output examples below!

Peeking at NATS traffic

NATS is the internal, lightweight message bus that Cloud Foundry components use to talk to one another. I’d read blog posts from Cornelia and from Dr Nic about digging into this before.

First of all, I used bosh ssh to access the NATS host:

$ bosh ssh
1. ha_proxy_z1/0
2. nats_z1/0
3. postgres_z1/0
4. uaa_z1/0
5. login_z1/0
6. api_z1/0
7. clock_global/0
8. api_worker_z1/0
9. etcd_leader_z1/0
10. hm9000_z1/0
11. runner_z1/0
12. loggregator_z1/0
13. loggregator_trafficcontroller_z1/0
14. router_z1/0
Choose an instance: 2
Enter password (use it to sudo on remote host): ***
Target deployment is `cf-warden'

Setting up ssh artifacts

Director task 9

Task 9 done
Starting interactive shell on job nats_z1/0

So now I’m on the NATS host – now what? well, strictly speaking I didn’t need to login to that host / container, since of course, as a messaging system, the other hosts can connect to it anyway. The reason I wanted to login to it was to find out how NATS was configured.

$ ps -ef | grep nats
root 1470 1 0 12:09 ? 00:00:12 /var/vcap/packages/gnatsd/bin/gnatsd -V -D -c /var/vcap/jobs/nats/config/nats.conf

$ more /var/vcap/jobs/nats/config/nats.conf

net: "10.244.0.6"
port: 4222

pid_file: "/var/vcap/sys/run/nats/nats.pid"
log_file: "/var/vcap/sys/log/nats/nats.log"

authorization {
user: "nats"
password: "nats"
timeout: 15
}

cluster {
host: "10.244.0.6"
port: 4223

authorization {
user: "nats"
password: "nats"
timeout: 15
}

routes = [

]
}

From this, I can see that NATS is listening on IP 10.244.0.6, port 4222 (the NATS default), and that it is configured for username/password authentication. Handy to know!

I borrowed a little script from Dr Nic, but needed to modify it slightly to talk to authenticated NATS (his original script assumed there was no auth in place):


#!/usr/bin/env ruby
require "nats/client"
NATS.start(:uri => "nats://nats:nats@10.244.0.6:4222") do
NATS.subscribe('>') { |msg, reply, sub| puts "Msg received on [#{sub}] : '#{msg}'" }
end

view raw

nats-all.rb

hosted with ❤ by GitHub

[update – Dr Nic has provided a more convenient method to do this, in the comments below – check out nats-sub – but this works, as well]

$ ./nats-all.sh
Msg received on [router.register] : '{"host":"10.244.0.134","port":8080,"uris":["login.10.244.0.34.xip.io"],"tags":{"component":"login"},"index":0,"private_instance_id":"e6194fe8-4910-4cb1-9f7c-d5ee7ff3f36b"}'
Msg received on [router.register] : '{"host":"10.244.0.130","port":8080,"uris":["uaa.10.244.0.34.xip.io"],"tags":{"component":"uaa"},"index":0,"private_instance_id":"7713dd5b-3613-41a6-9c67-c48f22a769b4"}'
Msg received on [router.register] : '{"dea":"0-1ba3459ea4cd406db833c1d188a78c02","app":"b8550851-37a0-4bd5-bdce-1d787b087887","uris":["andyp.10.244.0.34.xip.io"],"host":"10.244.0.26","port":61021,"tags":{"component":"dea-0"},"private_instance_id":"b52dfd91d68144cabb14b6c7bae77daae8b493acf1354c99941d49772a1f61fb"}'
Msg received on [router.register] : '{"dea":"0-1ba3459ea4cd406db833c1d188a78c02","app":"b8550851-37a0-4bd5-bdce-1d787b087887","uris":["andyp.10.244.0.34.xip.io"],"host":"10.244.0.26","port":61025,"tags":{"component":"dea-0"},"private_instance_id":"090f5c5aeee94fdfb4a4e0f0afde2553480dcd97c018431db37b4dffdc80fde4"}'
Msg received on [router.register] : '{"dea":"0-1ba3459ea4cd406db833c1d188a78c02","app":"b8550851-37a0-4bd5-bdce-1d787b087887","uris":["andyp.10.244.0.34.xip.io"],"host":"10.244.0.26","port":61028,"tags":{"component":"dea-0"},"private_instance_id":"92e10af77b274836a3f54373c9b7feee025c5b72f41a4c4982bde97d241ebd5b"}'
Msg received on [router.register] : '{"dea":"0-1ba3459ea4cd406db833c1d188a78c02","app":"b8550851-37a0-4bd5-bdce-1d787b087887","uris":["andyp.10.244.0.34.xip.io"],"host":"10.244.0.26","port":61039,"tags":{"component":"dea-0"},"private_instance_id":"86edf0c0a7f84f04b52693b489ad93b7f857f77271b84d568d8f5600b34f7054"}'
Msg received on [router.register] : '{"host":"10.244.0.26","port":34567,"uris":["8b24c0a7d28f4e03aa028a3dc89fb8c3.10.244.0.34.xip.io"],"tags":{"component":"directory-server-0"}}'
Msg received on [dea.advertise] : '{"id":"0-1ba3459ea4cd406db833c1d188a78c02","stacks":["lucid64"],"available_memory":23296,"available_disk":22528,"app_id_to_count":{"b8550851-37a0-4bd5-bdce-1d787b087887":10},"placement_properties":{"zone":"default"}}'
Msg received on [staging.advertise] : '{"id":"0-1ba3459ea4cd406db833c1d188a78c02","stacks":["lucid64"],"available_memory":23296}'
Msg received on [dea.heartbeat] : '{"droplets":[{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"d92d3c0c43ce4b6981e443e5c2064580","index":0,"state":"RUNNING","state_timestamp":1392639135.9526377},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"898e632697e246de9cf6b7330444227c","index":1,"state":"RUNNING","state_timestamp":1392639136.3117783},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"56d023e374aa49d88720daabac58e862","index":2,"state":"RUNNING","state_timestamp":1392639135.2225387},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"f11d86a7f4ad47f1ad554ae1b087d5f6","index":3,"state":"RUNNING","state_timestamp":1392639136.1042},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"c9e6de77f0484e6cae47f73ad6ca778a","index":4,"state":"RUNNING","state_timestamp":1392639135.9426212},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"924c387fc33444289b2db2762eefac42","index":5,"state":"RUNNING","state_timestamp":1392639135.940636},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"69866b260b1a49a09c03e178c4add2c5","index":6,"state":"RUNNING","state_timestamp":1392639135.944143},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"94bc605505d94dc1832e55bf2f671a99","index":7,"state":"RUNNING","state_timestamp":1392639135.4456258},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"8420df9bbe64456385dfa91285641ba4","index":8,"state":"RUNNING","state_timestamp":1392639135.9456131},{"cc_partition":"default","droplet":"b8550851-37a0-4bd5-bdce-1d787b087887","version":"a420d371-0816-4baf-9649-4e21255a66a4","instance":"ed9ad14f6599494c96f90296c59e6041","index":9,"state":"RUNNING","state_timestamp":1392639135.938359}],"dea":"0-1ba3459ea4cd406db833c1d188a78c02"}'
Msg received on [router.register] : '{"host":"10.244.0.10","port":8080,"uris":["loggregator.10.244.0.34.xip.io"]}'
Msg received on [router.register] : '{"host":"10.244.0.138","port":9022,"uris":["api.10.244.0.34.xip.io"],"tags":{"component":"CloudController"},"index":0,"private_instance_id":null}'
Msg received on [router.register] : '{"host":"10.244.0.134","port":8080,"uris":["login.10.244.0.34.xip.io"],"tags":{"component":"login"},"index":0,"private_instance_id":"e6194fe8-4910-4cb1-9f7c-d5ee7ff3f36b"}'

Warden containers and shells

Cloud Foundry’s native container technology is called Warden. When an application is deployed, Cloud Foundry starts up a Warden container based on the limits assigned in terms of memory etc, and the applications run inside that. How can you get “inside” the container to see what is going on?

Well, there are a couple of techniques. Cloud Foundry Loggregator provides streaming access to the standard application logs (stdout/stderr) via the cf logs command. Another option is James Bayer’s cool websocket-based method for getting access to the container. Yet another option is Warden’s own shell, wsh. This does assume you can access the DEA machine with ssh, however.

wsh doesn’t seem to be very well documented, although I knew Cornelia had played around with it – see her excellent blog post on troubleshooting CF and applications, including a great flowchart / graphic suggesting different techniques.

Here’s the secret sauce:

1. Login to the DEA VM (called “runner_z1/0” in the list provided by bosh ssh).

2. Identify your Warden container… there are a lot showing below, but I happen to know that these are several instances of the same app. The important part is the instance-17ij46hadt2 – the second part or that value maps to the location of the container’s private space on disk.

$ ps -ef | grep warden
root        49    42  1 11:41 ?        00:00:41 /var/vcap/bosh/bin/ruby /var/vcap/bosh/bin/bosh_agent -c -I warden -P ubuntu
root      5390 32634  0 12:12 ?        00:00:00 /var/vcap/data/packages/warden/38.1/warden/src/oom/oom /tmp/warden/cgroup/memory/instance-17ij46hadss
root      5503 32634  0 12:12 ?        00:00:00 /var/vcap/data/packages/warden/38.1/warden/src/oom/oom /tmp/warden/cgroup/memory/instance-17ij46hadsu
root      5697 32634  0 12:12 ?        00:00:00 /var/vcap/data/packages/warden/38.1/warden/src/oom/oom /tmp/warden/cgroup/memory/instance-17ij46hadt3
root      6779 32634  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadsu/bin/iomux-spawn /var/vcap/data/warden/depot/17ij46hadsu/jobs/58 /var/vcap/data/warden/depot/17ij46hadsu/bin/wsh --socket /var/vcap/data/warden/depot/17ij46hadsu/run/wshd.sock --user vcap /bin/bash
root      6780  6779  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadsu/bin/wsh --socket /var/vcap/data/warden/depot/17ij46hadsu/run/wshd.sock --user vcap /bin/bash
root      6784 32634  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadsu/bin/iomux-link -w /var/vcap/data/warden/depot/17ij46hadsu/jobs/58/cursors /var/vcap/data/warden/depot/17ij46hadsu/jobs/58
root      6930 32634  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadss/bin/iomux-spawn /var/vcap/data/warden/depot/17ij46hadss/jobs/59 /var/vcap/data/warden/depot/17ij46hadss/bin/wsh --socket /var/vcap/data/warden/depot/17ij46hadss/run/wshd.sock --user vcap /bin/bash
root      6931  6930  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadss/bin/wsh --socket /var/vcap/data/warden/depot/17ij46hadss/run/wshd.sock --user vcap /bin/bash
root      6934 32634  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadss/bin/iomux-link -w /var/vcap/data/warden/depot/17ij46hadss/jobs/59/cursors /var/vcap/data/warden/depot/17ij46hadss/jobs/59
root      6950 32634  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadt3/bin/iomux-spawn /var/vcap/data/warden/depot/17ij46hadt3/jobs/60 /var/vcap/data/warden/depot/17ij46hadt3/bin/wsh --socket /var/vcap/data/warden/depot/17ij46hadt3/run/wshd.sock --user vcap /bin/bash
root      6955  6950  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadt3/bin/wsh --socket /var/vcap/data/warden/depot/17ij46hadt3/run/wshd.sock --user vcap /bin/bash
root      6960 32634  0 12:12 ?        00:00:00 /var/vcap/data/warden/depot/17ij46hadt3/bin/iomux-link -w /var/vcap/data/warden/depot/17ij46hadt3/jobs/60/cursors /var/vcap/data/warden/depot/17ij46hadt3/jobs/60
vcap     23713 16807  0 12:26 pts/0    00:00:00 grep --color=auto warden
root     32634     1  0 11:52 ?        00:00:09 ruby /var/vcap/data/packages/warden/38.1/warden/vendor/bundle/ruby/1.9.1/bin/rake warden:start[/var/vcap/jobs/dea_next/config/warden.yml]

3. Head over to the directory for your chosen Warden instance:

$ cd /var/vcap/data/warden/depot/17ij46hadt2

4. Notice that the Warden containers are running as root. If you run wsh now as an unprivileged user, you’ll get a connect: Permission denied error. Time to switch to root, and then run wsh specifying the command to run inside the shell, as a parameter:

$ sudo su -
# cd /var/vcap/data/warden/depot/17ij46hadt2
# bin/wsh /bin/bash

5. At this point, we’re inside the Warden container with a bash shell, and all commands are scoped inside it. So, let’s take a look at what is running:

root@17ij46hadt2:~# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 12:12 ?        00:00:00 wshd: 17ij46hadt2
vcap        29     1  0 12:12 ?        00:00:00 /bin/bash
vcap        31    29  0 12:12 ?        00:00:00 ruby /home/vcap/app/vendor/bundle/ruby/1.9.1/bin/rackup config.ru -p 61031
vcap        32    31  0 12:12 ?        00:00:00 /bin/bash
vcap        33    31  0 12:12 ?        00:00:00 /bin/bash
vcap        34    32  0 12:12 ?        00:00:00 tee /home/vcap/logs/stdout.log
vcap        35    33  0 12:12 ?        00:00:00 tee /home/vcap/logs/stderr.log
root        39     1  0 12:27 pts/0    00:00:00 /bin/bash
root        52    39  0 12:27 pts/0    00:00:00 ps -ef

This is our Ruby app, running on port 61031, and we can see the logs being written as well.

Hopefully this is useful information for folks wanting to dig around inside bosh-lite and a running Cloud Foundry system!

8 thoughts on “Getting inside Cloud Foundry for debug (and profit?)”

  1. nice write-up andy! the instances.json file on the DEA should help you identify which warden container handle belongs to which app easily. you should be able to find that easily once on the dea by using something like “find /var/vcap -name instances.json”.

  2. Yup – that’s (as root)

    /var/vcap/data/packages/nats/10.1/nats/vendor/bundle/ruby/1.9.1/bin/nats-sub ‘>’ -s nats://nats:nats@10.244.0.6:4222

    (and no smart quotes, correct those if WordPress messes them up)

    1. Hi, I have just a question that if the vmc-IronFoundry can only be inlltsaed on Windows.Now I have made micro cloud foundry and micro iron foundry run on a physical machine with Ubuntu. And I tried to push asp.net applications with the vmc-IronFoundry inlltsaed on Ubuntu. But it didn’t work.

  3. There is a way to identify the exact warden container for a given app instance.

    First get the GUID for the app, which can be seen by running “CF_TRACE=true cf app “. One of the requests should be something like “GET /v2/apps//summary”, where “” is the GUID for the app.

    Once you have the app’s GUID, you can look in the file “/var/vcap/data/dea_next/db/instances.json” in the “runner” VM. This file contains the mapping of app instances to Warden containers. Look for the “application_id” and “instance_index” fields that match the app and instance you care about, then use the “warden_container_path” value in the same JSON hash.

Leave a Reply