System documentation

How CareVision works.

A real-time caregiver alert system that uses on-device computer vision and low-latency video streaming to give caregivers fast, trustworthy context when a resident needs attention.

From detection to response in seconds.

Every feature in CareVision exists to serve one loop: something happens in a room, a caregiver finds out immediately, sees live context, and takes action. The entire cycle completes in under two seconds on a local network.

Sensor captures

Camera runs at 30fps,
Vision samples at ~5fps

iOS + Vision

CV detects event

Bed exit, upright pose,
or motion spike

On-device

Backend creates alert

Validates, deduplicates,
broadcasts via SSE

Fastify + SSE

Caregiver sees alert

Real-time push,
no polling needed

SSE stream

Opens live stream

Sub-second video,
motion overlay

WebRTC

Three layers, clean boundaries.

Each layer does one thing well and communicates through well-defined contracts. The iOS app handles capture and display. The backend handles orchestration. WebRTC handles media.

Client

iOS App

SwiftUI + AVFoundation + Vision + LiveKit

Sensor Mode

Camera capture, on-device Vision analysis, bed-exit state machine, LiveKit publish

Caregiver Mode

SSE alert stream, live video subscriber, alert inbox, acknowledge actions, timeline

Motion Overlay

Renders skeleton, bounding box, and motion trail on the live stream

Server

Backend

Node.js + TypeScript + Fastify + Zod

Auth + Tokens

Demo sessions with bearer tokens, LiveKit JWT generation with role-based permissions

Event Ingest + Alerts

CV event processing with idempotency, alert state machine, timeline audit trail

SSE Fanout

Real-time push to all connected caregivers when alerts are created or updated

Transport

Communication

WebRTC + SSE + REST

WebRTC via LiveKit

Low-latency video streaming between sensor and caregiver devices

Server-Sent Events

Unidirectional push for alerts and timeline updates, no client polling

REST API

Session creation, token requests, alert acknowledgment, timeline queries

On-device detection. Nothing leaves the phone.

The Vision pipeline runs entirely on the iPhone using Apple's framework. No video frames are uploaded. No cloud ML. The phone processes roughly 5 frames per second and feeds the results into a state machine that decides when something clinically relevant is happening.

1

Capture

AVFoundation grabs camera frames at 30fps. The pipeline samples every ~200ms to keep CPU reasonable.

2

Detect

Vision finds people in the frame (bounding box) and estimates 19 body pose landmarks per person.

3

Classify

Calculates upright score, leg extension, motion delta, and zone occupancy from the pose data.

4

Decide

BedExitDetector state machine applies persistence and cooldown windows to avoid false positives.

What it measures

Upright scoreShoulder-hip alignment + aspect ratio
Leg extensionHip-knee-ankle geometry
Motion scoreFrame-to-frame centroid delta
Zone occupancyCenter of mass inside exit zone

How it avoids noise

Persistence window0.6 – 0.8s before triggering
Cooldown window8 – 10s between re-triggers
Clear window3s vacancy resets state
Idempotency keysBackend deduplicates events

Simple states, clear transitions.

Every alert follows a predictable lifecycle. There is no ambiguity about what state an alert is in or who acted on it. The timeline records every transition.

New

Just created

acknowledge

Acknowledged

Caregiver saw it

escalate

Escalated

Needs more help

Alerts can also go directly from New to Escalated, skipping acknowledgment. There is no reopen or resolve in the MVP — once escalated, the alert is terminal.

Three protocols, each chosen for a reason.

We do not use one protocol for everything. Each communication path uses the transport that matches its latency, reliability, and direction requirements.

Bidirectional

WebRTC via LiveKit

Live video + data channel

Sub-second latency video from sensor to caregiver. Also carries the motion overlay data via an unreliable data channel so the stream is never blocked by overlay failures.

Latency critical
Server → Client

Server-Sent Events

Alerts + timeline updates

Persistent HTTP connection from the backend to every connected caregiver. When an alert is created or updated, the backend pushes it instantly. No polling, no WebSocket complexity.

Real-time push
Request / Response

REST API

JSON over HTTP

Used for actions that need confirmation: creating sessions, requesting tokens, acknowledging alerts, querying the timeline. Standard request-response where you need a guaranteed result.

Reliable actions

Built for speed that matters clinically.

These are the performance budgets for the local demo environment. In a care setting, every second between detection and response matters.

<1.5s
Time to first frame
From tapping the stream to seeing live video
<500ms
Event to alert
From CV detection to caregiver notification
<300ms
Ack to update
From acknowledgment to UI confirmation
5 fps
Vision throughput
Frames analyzed per second on device

Clean, navigable, intentional.

The codebase is organized by responsibility. Backend source is 5 files. The iOS app separates features, services, and shared components.

backend/ src/ index.ts # Fastify server, all API routes types.ts # Shared type definitions store.ts # In-memory data store and pub/sub config.ts # Environment configuration livekit.ts # LiveKit JWT token generation scripts/ smoke.mjs # Integration smoke test public/ laptop-sensor.html # Browser-based sensor for demos ios/CareVisionSample/ App/ # Entry point, root navigation Core/ # Theme and shared data models Features/ Sensor/ # Sensor mode views and view model Caregiver/ # Caregiver mode views and view model Common/ # Shared UI (overlay, camera, cards) Services/ # API client, SSE, LiveKit, camera, detection docs/ api-contract.md # Complete REST API spec DECISIONS.md # Architecture scope decisions event-spec-bed-exit-risk.md # CV event detection spec demo-validation.md # Demo script and perf budgets