AI meets privacy: ngrok's secure bridge to customer data
Previously on AI-meets-ngrok: You built a proof of concept deployment that uses ngrok as a secure tunnel to external GPU compute, which you’re using to experiment with AI development with a self-hosted, open source LLM.
Your story doesn’t end there. After validating what you could do with AI, external GPU, and some example data, your CTO has asked you to take this whole AI experiment one (big) step further.
Now, you’re going to deploy a containerized service to your customer’s virtual private cloud (VPC), which runs AI workloads against their customer data and makes results available via an API. You’ll use ngrok to securely tunnel those API endpoints to your SaaS platform, which your customer accesses to view results and insights.
Your first reaction is right: The fundamentals of what you learned with your external GPU experiment will be helpful here, but this is a major jump in complexity. Luckily, with that knowledge and ngrok’s secure tunneling still at your side, you’re already closer to a successful deployment than you might think.
Why run API workloads on external customer data
The first question you might ask is why your customer can’t just upload their data to your service and run AI workloads there. Wouldn’t that make the topology of this infrastructure a lot simpler? The most obvious answer is that your customer will refuse, citing their obligation to protect customer data and the integrity of their cloud infrastructure.
Egressing data outside their infrastructure is almost always a no-go—they’ll have obligations to customers and regulators. Beyond issues of privacy and governance, you’re also dealing with the sheer volume of data your customers want to analyze with your API service. The transfer costs associated with data between clouds would quickly outweigh the value your service could provide.
The opposite solution—the customer running and managing your entire service on-premises, is often a hard sell as well. They’ll balk at broad changes to their existing virtual private cloud (VPC) infrastructure, and often don’t want the additional IT/DevOps burden.
These challenges are the lever behind the quick jump in traction for Bring Your Own Cloud (BYOC) architecture. With a BYOC architecture, you deploy a service into a customer’s data plane to process and analyze their data using APIs. Your SaaS platform is the control plane, operating as the central hub to tell your data plane services what to analyze and when, along with giving your customers a centralized place to view results.
Databricks is a great example of an AI platform that operates with a BYOC architecture. Once a Databricks customer sets up their workspace, a secure integration between the Databricks platform and their cloud provider, Databricks deploys new clusters and resources in the customer’s cloud to process and store data.
In your case, you get some knock-on benefits, too:
- Most of the compute, particularly for AI workloads, happens on your customers’ infrastructure, lessening your burden to build and maintain it yourself.
- Because your customer’s data never leaves their network, your path to a successful deployment requires much less technical complexity and scrutiny around data security.
- You maintain strong control of your service running in customer networks.
Architect your integration with your customers’ network
Normally, accessing data and running workloads in external networks using a BYOC architecture comes with numerous complexities:
- Bureaucratic approvals from NetOps, DevOps, and SecOps, which often drag on for months.
- Configuration hiccups around network/application firewalls, load balancers, VPC routers, and more.
- Security and compliance issues around encryption, secure data transfer, and strict access control.
- Complexity in the deployment process, which leads to delays and crushes the time-to-value of your AI service.
- Falling behind on changes in the AI landscape—like the release of a new fine-tuned LLM that would serve your customers better—because of the complexity around the integration.
With ngrok operating as a secure tunneling middleware between your control plane and your customer’s data plane, you’ll bypass all that headache.
In reality, you’ll find that every customer’s infrastructure is different, which is why talented solutions architects are so valuable. In this example, however, your customer is using a public cloud provider and already runs Kubernetes clusters.
The process of setting up this BYOC architecture is roughly:
- Deploy a new Kubernetes cluster in the customer’s virtual private cloud (VPC).
- On said cluster, install the ngrok Kubernetes Ingress Controller with Helm.
- Create the services/resources to run your AI service.
- Configure access to the customer’s data volumes from the pods/services in the AI deployment.
- Set up routes for the services your control plane needs to access.
- Double-check ngrok has automatically set up HTTPS edges for those routes, which publish an API.
- Layer in security as needed.
- Start consuming the results of your AI service via an API!
In the end, your customers will access your SaaS platform at console.YOUR-AMAZING-AI-PLATFORM.com
. Speaking of which, we haven’t yet talked about what service you’re running on the customer’s data plane. Because you already got some experience in Ollama, you'll continue along that path.
First, let’s clarify: this is an example illustrating what’s possible, not a real-world solution. You’re never going to deploy Ollama to a customer’s cloud as a production AI workload, as it’s not designed to access, process, and analyze data in the ways a real AI service would require. The example might be far-fetched, but the process covered below, particularly around setting up a secure ngrok tunnel for your customer, is one you can adapt to your actual deployment process.
Create your containerized, Kubernetes-ready AI service
We won’t go into all the nuances of containerizing an application or creating a Kubernetes manifest for it—Docker has published documentation covering the fundamentals on creating a Dockerfile and using docker build. The Kubernetes docs also contains extensive resources on managing resources and deploying containers using kubectl
.
Fortunately, the folks behind Ollama have already containerized the service and created a Kubernetes manifest.
This Kubernetes service tells you how to set up your ingress: ngrok should direct incoming traffic to the appropriate domain name, which you’ll set up alongside your customer, through a Kubernetes ingress and to the <code>ollama</code> service running on port <code>80</code> in your cluster’s internal network.
If you were building a real AI service, you would also need additional deployments, using Kubernetes secrets, for accessing databases to leverage the customer data that required this BYOC infrastructure in the first place.
Deploy your AI service on your customer’s data plane
Now, we’ll focus on installing the services you’ll need to process data using AI and handle ingress. Based on the relationship with your customer, you may run these commands yourself or provide a subset of them to your customer in a document to help with onboarding.
Install your AI service
Using Ollama as an example, you can copy-paste the Kubernetes manifest above or grab it from GitHub, then apply the deployment to your customer’s cluster.
Remove ngrok branding from the Agent Ingress
To create a secure tunnel to the ngrok Cloud Edge, every ngrok Agent first authenticates using your account credentials at the default Ingress Address, which is <code>connect.ngrok-agent.com:443</code>. This Ingress Address is then used to tunnel traffic to and from your application service.
For the clearest and easiest to manage configuration, you can configure all ngrok Agents to connect to a specific domain, <code>tunnel.CUSTOMER_DOMAIN.com</code>, instead of the default hostname, which is <code>connect.ngrok-agent.com</code>. You can also extend the brand-free Agent Ingress experience with more features, like dedicated IPs for your account, for more reliability and security assurances for your customers.
Not every ngrok account can create Agent Ingresses—if you see ERR_NGROK_6707, you can reach out to ngrok support to learn more about activating the feature.
Add a custom wildcard domain
You’ll start by creating an ngrok API key for this customer, which gives them privileges to configure their tunnels, domains, and Edges. It’s helpful to export the API key for easy access later.
Next, request the ngrok API to create a wildcard domain at <code>*.CUSTOMER0001-DOMAIN.COM</code> using automated TLS certificates.
You would use a custom TLS certificate (<code>certificate_id</code>) in real-world production environments. For the best experience for your customer, you’ll want to manage these domains by setting up the appropriate DNS records, which the ngrok dashboard will help you with.
Configure the customer’s ngrok Authtoken
Your customer will use their Authtoken credential to start new tunnel sessions, but by specifying a bind, you restrict them from creating new Edges or Tunnels on anything but the wildcard domain. This step ensures that your service, running on your customer’s data plane, never creates tunnels on domains you haven’t protected against eavesdropping or intrusion.
A successful response looks like the following.
You can find the new Authtoken in the ”token” field of the response—it’s also helpful to export this customer-specific Authtoken for future use.
Install and configure the ngrok Kubernetes Ingress Controller
The ngrok Kubernetes Ingress Controller automatically creates secure Edges on the ngrok Network for your services, cutting away all the networking and security complexity around typical BYOC integrations. As a bonus, you’ll be able to quickly add observability, authentication, and other security measures, like IP restrictions, to prevent unauthorized access to your ngrok Edges.
First, add the Helm repository for the ngrok Kubernetes Ingress Controller.
Next, install the ngrok Kubernetes Ingress Controller using the new customer-specific Authtoken.
If you created a new ngrok Agent Ingress for your customer for an unbranded experience, you’ll also need to specify <code>--set serverAddr=$TUNNEL</code> during installation of the Ingress Controller.
Next, create a Kubernetes Ingress resource for your AI service. This is where your Ollama example comes in handy again—set <code>host</code> to a subdomain on the wildcard set up earlier, then </code>name</code> and <code>port.number</code> of the Ollama service for ngrok to securely tunnel through the Cloud Edge.
This configuration produces a single HTTPS Edge at <code>ai.CUSTOMER0001-DOMAIN.com</code> and a single route, pointing <code>/</code> toward <code>ollama:80</code>, which is already running on their Kubernetes cluster. Your SaaS platform, running in your cloud, can now request the Ollama API made available at <code>ai.CUSTOMER0001-DOMAIN.com</code> to run AI workloads and pull results to your cloud SaaS.
Access your AI service via API
With Ollama running on the customer’s data plane, and the secure tunnel connecting it to your cloud SaaS via the ngrok Cloud Edge running smoothly, you can start controlling your AI service via an API.
Start by pulling a model.
Test-run a basic response.
You’re off to the races! Ollama comes with an extensive API manipulating the AI service running on your customer’s data plane, and your real-world AI service would operate much the same way.
The technical prowess of ngrok’s Cloud Edge
With ngrok taking care of tunneling and ingress to your AI service running on your customer’s BYOC infrastructure, you can now add in additional security and routing features without another round of approvals from your customer’s NetOps/DevSecOps teams and even more complexity for your solutions architects.
IP restrictions
One of the easiest ways to demonstrate just how easily you can extend your usage of the ngrok Agent for better security practices is IP restrictions. Your customers will appreciate knowing that only your SaaS platform, which uses dedicated IPs, can access the routes exposed by the ngrok Kubernetes Ingress Controller. Create a <code>ip-policy.yaml</code> file with the following and create the resource with <code>kubectl -f apply ip-policy.yaml</code>.
You can also extend your security practices to authentication via an ngrok- or user-managed OAuth provider or OpenID Connect provider.
Multiple routes (fanout)
Based on how your AI service runs, and the routes/ports it exposes, you can extend the ngrok Kubernetes Ingress Controller to support multiple routes with a fanout. With the following Ingress configuration, you’ll have a single HTTPS Edge with two routes, the second of which points /bar to a different Kubernetes Service.
What’s next?
With this BYOC deployment workflow, you can bring your AI service straight to the customer data you need without the complexity typically involved in accessing external networks. Not only do you lessen the technical burden on your organization, but you deliver value to your customers far faster.
With the AI and ML space moving faster than ever before, that speed can make all the difference. Sign up today to start building that delivers value where your customers need it most—their data.
Are you building an AI service of your own? Curious about how BYOC is changing and growing in the cloud native era? We’d love to hear how you’re simplifying your networking stack with ngrok by pinging us on X (aka Twitter) @ngrokhq, LinkedIn, or by joining our community on Slack.