Azure IoT Hub and MQTT – Capture to CosmosDB

Azure IoT Hub is not a proper MQTT Broker but does have sufficient MQTT support to allow it to be used with MQTT devices and can be easily coupled with Azure Cosmos DB to record the message payloads. Both these Azure services are available with Free Tier pricing, subject to usage limits (daily quota of 8000 Hub messages) which would be quite acceptable for domestic/hobbyist/diy IoT use*. In contrast to the open public free MQTT brokers, the Azure setup offers privacy and security.

[* – On the whole, I favour keeping what I can on the LAN and using Mosquitto on a Raspberry Pi, but my particular use case is devices using the “mobile carriers'” LTE CAT-NB (NB-IoT) or LTE CAT-M1 networks with modules based on the SimCom Y7080]

Having struggled to extract the necessary information from Microsoft documentation, here are some notes. They presume an Azure subscription and use of MQTT Explorer, but can surely be applied to other MQTT clients and physical devices.

Preparing Azure Cosmos DB

There are three layers to a CosmosDB setup: the “DB account”, the “database”, and the “container”. Only one “DB account” can be created within the Free Tier and the throughput, which is limited to 1000RU/s can only be split between databases in units of 400RU/s. Consequently, it probably makes sense to run one database in one account and to work at the container level. As far as I can see, the free RU allocation is more than adequate for the kinds of use outlined above. Create a “no sql” account.

Once you have created a DB account, use Azure Portal to access the Data Explorer for that account. I think the defaults for “New Container” are probably fine but be sure to set automatic indexing and note that the partition key name will be required for the next step (using a key name of “/partition_key” is as good as anything, noting the initial slash).

Data Explorer also allows you to browse and query the records in the container (“Items”) and to alter settings (“Settings”). A little trick for deleting records while experimenting (there is no equivalent to SQL DELETE!) is to use Settings to temporarily set the “Time to Live” to “On” and give a time of 0 or a few seconds (note you do need to hit “Save”).

Preparing Azure Iot Hub

Creating an IoT Hub in Azure Portal deserves little comment, except to note that you should leave the default Networking > Connectivity configuration = “Public Access” (and not change to the “recommended” option). In what follows, I will use {hubName} as a placeholder for the name assigned in this step.

It is essential that a “device” is created in Azure, to allow a real device or MQTT client application to interact with the hub. In Azure Portal, refer to Device management > Devices. Add a device with the default settings (this is not an “IoT Edge Device”, authentication should be “Symmetric key” (it will use “Shared Access Signature”) with auto-generated keys). I will use {deviceId} for the ID assigned in this step.

Note that the Hub is not a store; its role is to receive and route messages. In this case, I wish to route messages to CosmosDB. This is achieved in Azure Portal by using Hub settings > Message routing. A useful routing setup actually comprises three parts: the route, an endpoint, and an optional enrichment (which is referred to later).

First add a “Custom endpoint” for Cosmos DB, entering the Cosmos DB account, database, and container. Use the same partition key name as assigned previously and leave the key value template as it is. Then add a route (the name is arbitrary) to use that endpoint and selects “Device Telemetry Messages” as the Data source. To route all messages to Cosmos DB, leave the “Routing query” as the default value of “true”.

At this point, you should see a notification that messages will not now flow to the built-in endpoint. That is OK. Since the routing query was left as the default, the “fallback route” feature should be disabled.

Using MQTT Explorer

This is the hardest bit to get right, and where the important details are rather lost in the MS documentation.

Before any device/client can connect, it will need a Shared Access Signature Token (SAS Token). Wierdly, Azure Portal does not let you create a SAS token for IoT Hub devices and the keys it does show are NOT what you need. SAS Tokens can be created with the Azure CLI utilities, but since I use VSCode for development (both for Python and for microcontrollers using PlatformIO), I opted to use the “Azure IoT Hub” extension (ignore the note recommending use of the Azure IoT Tools extension pack; that is now discontinued!). This will add an “AZURE IOT HUB” entry to the VSCode Explorer panel, from which a right-click context menu allows you to generate a SAS Token for any device.

A new MQTT Explorer connection should have the following settings ({…} denotes placeholders mentioned above):

  • Turn “encryption” on (the MQTT interaction must use Transport Layer Security, commonly known as Secure Sockets Layer, SSL)
  • Protocol: mqtt://
  • Host: {hubName}.azure-devices.net (this is the current domain used – check Azure Portal)
  • Port: 8883
  • Username: {hubName}/{deviceID}
  • Password: {SAS token}
  • Advanced > MQTT Client ID: {deviceID}

MS documentation refers to “api-version” in Host and Username but I found this to be redundant and the value doesn’t appear in Azure Portal AFAIK.

The IoT Hub is not a proper MQTT broker, and the topic is prescribed; it must be: devices/{deviceId}/messages/events/

It should now be possible to make a connection and send a message without MQTT Explorer indicating “disconnected” errors. Use the Cosmos DB Data Explorer to check for new records (note there is a “refresh” icon). Look in the record JSON and find the element “Body”. This contains the message but in base64 encoded form. The original message can be found by decoding using an online tool such as base64decode.

Improving the Message

It is possible to avoid having to deal with base64 encoded Body values by declaring the content type with the message (base64 is a helpful way of allowing arbitrary binary data to be expressed as a text string). This can be done by decorating the MQTT topic. To tell IoT Hub that the message payload is JSON, set the topic to: devices/{deviceId}/messages/events/$.ct=application%2Fjson&$.ce=utf-8 . A message sent with the new topic should appear with JSON in the Body of the Cosmos DB record, which should also now contain new elements declaring the content type and encoding. Note that the MQTT Explorer selection of raw/XML/JSON does not work as expected.

An alternative to working with the message Body is to put the data in the topic, leaving the message empty and so not needing the .ct and .ce decorations. For example, using a topic such as devices/{deviceId}/messages/events/rh=80.2&temp=14.2 gives you JSON in CosmosDB containing elements “rh” and “temp” inside a “Properties” container element.

Finally, the message enrichment feature of IoT Hub can be used to inject information into the Cosmos DB recods, although this is quite limited and probably not particularly useful. The process relies on the Azure concept of the “device twin”, which is its metadata counterpart to the real device. The following process allows you to associate values with a device in Azure Portal and see them come through to Cosmos DB:

  • Add a tag to the device, e.g. add a tag named “foo” with value “bar”.
  • In the Enrich messages tab add an entry with a Value of “$twin.tags.foo” and choose a Name (e.g. “Foo”) which you wish to appear in the Cosmos DB document.
  • Choose the Cosmos DB endpoint.

Cosmos DB records will now contain an element called “Foo” inside the “Properties” container, with a value of “bar”.

Postscript

Since writing this, I have discovered that Cosmos DB has some irritating limitations on aggregation functions for cross-partition queries. The notes above will fragment data across partitions (usually a good idea for scaling) but will require cross-partition queries. Since you get 20GB of data per partition, the loss of querying power by partitioning is really not justified for the scenario above.