What We Learned Building Face Recognition Without Cloud Dependency
Last year we got a brief that sounded simple: build a facial attendance system for a manufacturing facility in Chhattisgarh. Five hundred employees, three shifts, multiple entry points.
The catch was in the constraints. The facility had unreliable internet. On a bad day, the connection dropped for hours. On a good day, latency to the nearest AWS region was 200ms — too slow when you have 80 workers queuing up at shift change.
Cloud-based face recognition was off the table. We had to build something that worked entirely on-premise, matched faces in under a second, and ran on hardware they already had.
Here is what we learned.
The obvious approach does not work
The first thing most people try is OpenCV with Haar cascades or a basic CNN for face detection, followed by comparing raw pixel data or simple feature vectors. This falls apart immediately in a factory environment:
- Lighting changes constantly. The entrance has fluorescent overhead lights, natural light from a loading dock, and shadows from moving machinery.
- Workers wear hard hats, safety glasses, and dust masks. Sometimes they take the mask off, sometimes they pull it below their chin.
- Camera angles vary. Workers do not pose for the camera — they walk past a tablet mounted on a wall.
We needed a model that generates robust face embeddings — numerical representations of a face that stay consistent despite these variations.
FaceNet embeddings and pgvector
We settled on FaceNet with ArcFace loss for generating 512-dimensional face embeddings. The key property of these embeddings is that faces of the same person cluster together in vector space, even under different lighting, angles, and partial occlusion.
The question was where to store and search these embeddings. Some teams use FAISS or dedicated vector databases. We chose pgvector — the PostgreSQL extension for vector similarity search. The reasoning was practical:
- The facility already needed PostgreSQL for attendance records, shift schedules, and reporting.
- Adding a separate vector database meant another service to deploy, monitor, and backup on-premise.
- With 500 employees, the dataset fits comfortably in memory. pgvector handles cosine similarity search across 500 embeddings in under 50ms.
One database. One backup strategy. One fewer thing to break at 2 AM.
On-device face detection
We put Android tablets at each entry point running ML Kit for face detection. ML Kit runs entirely on-device — no network call needed just to find a face in the camera frame.
The tablet captures the face crop, sends it to the on-premise server for embedding generation and matching. If the network between the tablet and server is down (rare, since it is a local network, but we planned for it), the tablet queues the face crop locally and syncs when the connection returns.
This gave us a nice separation: the tablet handles the real-time camera work, the server handles the heavy computation. Neither depends on the internet.
The problems nobody warns you about
Enrollment quality matters more than matching quality
We spent weeks tuning the matching pipeline. The actual bottleneck was enrollment. If a worker's reference photo was taken in poor lighting or at a weird angle, no amount of matching sophistication would help.
We ended up building an enrollment flow that captures multiple face angles and rejects low-quality images before saving. This single change improved our match rate more than any model tuning.
Twins and family resemblances
The facility had two sets of twins among 500 employees. FaceNet embeddings for twins are close but not identical. We had to lower our similarity threshold and add a confirmation step when the top two matches were within a tight margin.
Temperature and hardware drift
The tablets ran in a semi-outdoor environment. In summer, ambient temperatures hit 45°C. The camera's color balance shifted, and processing slowed. We added thermal monitoring and reduced frame processing rate when the device was hot, which prevented crashes without affecting the user experience noticeably.
What we would do differently
If we built this again:
- Start with enrollment. Build the enrollment flow first, validate embedding quality before touching the matching pipeline.
- Use ArcFace from the beginning. We started with a vanilla FaceNet model and switched to ArcFace partway through. ArcFace produces tighter clusters and better separates similar faces. We should have started there.
- Deploy a second tablet at high-traffic entry points. One tablet per entry point was fine for most shifts, but the morning shift change created a 5-minute queue. A second tablet would have halved the wait.
The result
The system has been running in production for over a year. Five hundred employees processed daily, sub-second matching, zero cloud dependency. The operations team has not called us about a system issue in months — which, honestly, is the best metric we have.
The broader lesson: cloud services are powerful defaults, but they are not always the right answer. When your constraints include unreliable internet, latency requirements, and data residency concerns, building on-premise with the right tools is not just viable — it is simpler.
Need help building something like this?
We build production-grade systems. Let's talk about your project.
Start a Conversation →