With the adoption of containers continuing to accelerate, we are seeing deployments of the Docker engine extending beyond the server. Containers are making an appearance in everything from storage devices and Ethernet switches, though to IOT monitoring/control equipment. Even edge compute devices such as digital signage, smart POS consoles, and self-service KIOSKs!
These new use-cases have dramatically changed how Docker needs to be managed. You can potentially have many hundreds (if not many thousands) of Docker-enabled systems in your environment. Most of these devices would either run without any form of clustering, or would run a lean orchestrator such as Docker Swarm.
Additionally, these devices may be distributed across many sites, and across many different networks, some may even be attached solely to the public internet using mobile data or satellite technologies.
We need a way to centrally manage these distributed instances...
With the current versions of Portainer, we expected the Portainer server to be able to initiate and maintain network connections to agents. Agents are expected to be on the same LAN/WAN as Portainer, or accessible via port forwarding behind a static public IP. This has worked well for more traditional deployments, but for these new use cases, the remote devices may not be able to have fixed public IP addresses, and most likely cannot have any port forwarding enabled.
For the last 3 months, Portainer and Intel Corporation have been working together to create a new type of agent, which we have called our "Edge Agent", and this agent works quite differently from any other.
For a start, the edge agent now initiates and maintains the connection to Portainer (not the other way around). With this new communication model, the Portainer container now needs to be reachable from the edge agents. What this means is that if your agents are remote sites (isolated network with internet access), then Portainer must be internet-facing (via port forwarding, or through having Portainer in a DMZ).
Secondly, in order to handle the scale required for these edge use cases, Portainer does not require a constant connection with the Edge Agent. Instead, the agent only establishes an active connection with Portainer when a management session is requested. The agent "checks in" with Portainer every 5 seconds, asking "do you need me?" If the answer is "no", the agent simply sleeps.
However, if an administrator has logged into Portainer and is wanting to manage a remote endpoint, then the next time the Edge Agent checks in (every 5 seconds), it receives a response of "yes, you are required". The Edge Agent then establishes a secure connection to Portainer, and the administrator can complete their tasks.
While the administrator retains an active session with the remote endpoint, the Edge Agent will ensure the secure connection remains up (and will reconnect it if its dropped). After the administrator has finished their work (switches to another endpoint, or closes their browser session), the agent starts a countdown timer, and after a further period of inactivity, will disconnect the secure connection and re-enter standby mode.
In order to ensure that connections between Portainer and the edge agents are secure, we have elected to deploy mutual/two-way authentication. When the edge endpoint is initially defined in Portainer, we generate a unique agent "join" token. This hashed token includes information on how the edge agent can reach Portainer (FDQN:Port) and is used by the agent to lock itself so it can only communicate with that specific Portainer instance.
In addition, on first start, the agent self-generates a UUID, and on first connection to Portainer, it provides this UUID to Portainer. From that point on, Portainer will only accept communications from that endpoint with agents that use that UUID.
If another agent is started elsewhere and an existing join token is reused, Portainer will refuse to allow the unknown agent to connect (even though it has a valid join token). For additional security, the agent can be assigned a UUID by the Portainer administrator, so that even the very first communication from the agent to Portainer is already pre-validated.
Through scale-testing conducted with Intel, we have validated that a single Portainer instance can manage 25,000 unique endpoints, be that 25,000 single node endpoints, or 25,000 sites, with each site being a cluster containing 1 or more nodes.
Through the scale testing, we determined that Portainer server resource consumption increases at a relatively linear rate of 5MB RAM, 0.5 MHZ of CPU, and 121B/s of network traffic per remote node added under management. We will be writing up deployment guides on how to deploy Portainer to support various remote deployment sizes.
In order to achieve this scale, several UX enhancements have been made to the Portainer core - these include back-end pagination on pages whereby the list of endpoints needs to be displayed (home page, endpoints page, stack migration page, host job scheduler page, endpoint groups page), and disabling the scheduled snapshot function on these Edge Agents (instead, a snapshot is taken whenever an admin session with an edge endpoint terminates).
The new Edge Agent will be officially released (as open source) on the 26th of July, 2019.
We hope you take the opportunity to explore this new Edge Agent, and we look forward to hearing how you put it to work in your edge compute use-cases.
A huge thank you to Intel for assisting us with this change, greatly appreciated!