At the recent SREday NYC 2025 event hosted at Viam's office, a panel of industry experts discussed the intersection of IoT and SRE. The panel featured Viam's Director of Engineering, Ale Paredes, alongside Jessica Garson (Elastic), Brian Annis (Place Exchange), and Vinny Ruia (Firefly Automatix). Below are key insights from their discussion on building reliable, scalable IoT systems.
Reliability strategies for resource-constrained IoT environments
Unlike cloud environments with virtually unlimited resources, IoT systems operate with significant constraints. Successful reliability strategies must account for limited computing power, storage capacity, and intermittent connectivity.
Three key approaches for managing these constraints:
- Edge-first computing enables devices to operate autonomously without constant connectivity
- Asynchronous data synchronization allows devices to store data locally and sync when connectivity returns
- Modular architectures provide flexibility to adapt to different use cases and environments
The stakes for reliability are particularly high in IoT because physical intervention is costly and sometimes logistically impossible. This reality requires more rigorous quality control than traditional cloud deployments.
Smart update strategies for IoT fleet management
Deploying updates across distributed IoT devices requires strategies that account for their unique characteristics.
Effective segmentation approaches:
- Break fleets into manageable groups to reduce risk during deployments
- Use feature flags to control functionality for specific device segments
- Implement canary deployments to test updates on small subsets before wider rollout
Explore best practices for managing fragment versions in IoT fleets.
Contextual update windows allow devices to update only when:
- Connected to reliable networks
- Powered by stable sources
- Operating during off-peak usage hours
Optimize updates by aligning with maintenance windows for better reliability and performance.
Technical safeguards to ensure smooth updates:
- Introduce random jitter in update check-ins to prevent overwhelming servers
- Build fallback mechanisms for environments with unreliable connectivity
- Implement verification processes to confirm successful update completion