What we do
GPUPilot monitors NVIDIA GPU clusters on Kubernetes and closes the loop between detection, diagnosis, and remediation. One read-only agent, deployed in 30 seconds. Real DCGM signals — XID errors, ECC (SBE/DBE) counters, row remaps, PCIe replays, NVLink bandwidth, thermal headroom. Real correlation against pods, events, and node pressure. Real suggested fixes, expressed as the kubectl command you would type yourself, that a human on your team reviews and approves. AI-assisted, not autonomous.
We ship two deployment models on the same agent: Connected for clusters with outbound HTTPS, and Air-gapped for sovereign and classified environments where telemetry cannot leave the perimeter. Both are in production today.
Who delivers GPUPilot
GPUPilot is delivered by Bynet Data Communications, one of Israel's most established infrastructure integrators. Bynet has been building, deploying and operating enterprise-grade networks, data-centres and cloud platforms for over fifty years, and today employs more than nine hundred engineers across Israel. That pedigree matters for a product like this: GPUPilot is not a code-only side project you install and hope for the best. When something fires on your cluster at 3 a.m., there is a real support organisation on the other end — local procurement, local invoicing, local escalation, and the same engineers that already run large-scale infrastructure for Israeli enterprises and the public sector.
Behind the product engineering sits Altostratus, the development team that builds GPUPilot in partnership with Bynet.
Our approach
- Read-only by design. The agent's ClusterRole grants only
get/list/watch. It cannot create, patch, or delete anything in your cluster. - Single egress path. One outbound HTTPS destination in Connected mode. Zero outbound traffic in Air-Gap mode.
- Safe to uninstall.
kubectl deletethe namespace and everything the agent created goes with it. No state left behind. - Honest about AI. AI-assisted, not autonomous. Suggested remediation is written for you and executed only after your operator approves. AI output can be wrong; humans decide.
- Honest about certifications. We do not hold third-party certifications today. We rely on architectural verifiability: what the agent can read, what it sends, where the data lives, and how you turn it off. See Security model.
Where we work
Israel-based. The Connected service is hosted in reputable cloud infrastructure with per-customer database isolation. The Air-Gap deployment runs entirely inside customer perimeters and is used today on multiple production GPU clusters.
Contact
Sales, procurement, support, security: computingIT@bynet.co.il. Careers: bynet.co.il/en/career.
← Back to GPUPilot