DramWell Docs

Overview

The Monitoring section of the Admin Portal is the operational health center for the DramWell platform. It surfaces real-time and historical data about service availability, API error rates, background job execution, webhook delivery, and application logs — giving the engineering and operations teams what they need to detect, diagnose, and resolve incidents quickly.

Key Concepts

Health Check — A periodic ping to each DramWell service endpoint that reports its status (Up, Degraded, Down) and response latency.

Error Rate — The percentage of API requests that returned a 4xx or 5xx response in a given time window. Tracked per service and per route.

Job Queue — A list of background jobs pending execution. Queues can accumulate during high-load periods or when a worker is down. Deep queues indicate a processing backlog.

Webhook Event — An outbound HTTP request sent from DramWell to a customer-configured endpoint when a platform event occurs. Failed webhook events are retried with exponential backoff.

Log — A structured application log entry emitted by any DramWell service. Logs include severity, service name, trace ID, and a structured payload.

Service Health

Go to Monitoring > Health. Each row represents a service with its current status and a 24-hour uptime percentage:

Service	Description
API	Core REST/tRPC API (api.dramwell.ai)
Dashboard	DramGuest/DramPulse/DramTrade app server
Admin	Admin Portal app server
Supabase	Database and authentication layer
Twilio Relay	Telephony webhook handler
Edge Functions	Supabase edge function runtime

Click any service to see a response-time histogram for the last 24 hours and a list of recent health check failures.

Error Tracking

Go to Monitoring > Errors. The top panel shows a sparkline of error rate over time. Below it, errors are grouped by type and sorted by occurrence count. Each error group shows:

Error message and stack trace sample
First seen / last seen timestamps
Occurrence count
Affected service and route
A sample of affected request IDs for correlation

Click any error group to see the full trace and all recent occurrences. Use the Resolve button to mark an error group as investigated (does not suppress future occurrences — they will reopen the group if they recur).

Job Queues

Go to Monitoring > Queues. Each queue is listed with its current depth, processing rate (jobs per minute), and oldest job age. Normal depth for all queues is under 100 jobs. A queue with depth over 500 or oldest job age over 5 minutes indicates a problem.

Actions available per queue:

Pause — Stops workers from pulling new jobs. Use while diagnosing a poison-pill job.
Resume — Re-enables processing after a pause.
Purge — Deletes all jobs in the queue. Irreversible — use only when you have confirmed the jobs are safe to discard.
Retry Failed — Re-enqueues all jobs in the Failed state for another processing attempt.

Webhook Delivery

Go to Monitoring > Webhooks. The table shows every webhook event in the last 7 days with its delivery status:

Delivered — Customer endpoint returned a 2xx response
Pending — Awaiting first delivery attempt or retry
Failed — All retry attempts exhausted

Click any failed event to see the full delivery history including each attempt's timestamp, HTTP status returned, and response body. From the detail view, click Force Retry to immediately attempt delivery again outside the normal retry schedule.

Logs

Go to Monitoring > Logs. The log viewer streams live structured logs from all services. Use the filter bar to scope by:

Service — API, Dashboard, Admin, Edge Functions
Severity — Debug, Info, Warning, Error
Trace ID — Correlate all logs from a single request
Time range — Up to 7 days of historical logs are queryable

Click any log entry to expand the full structured payload. Use the Copy Trace ID button to pull all logs associated with a request into a filtered view.

Tips

Set a browser bookmark directly to the Health page. It is the fastest way to confirm whether a reported issue is a platform outage or a configuration problem on the customer's end.
Correlate error spikes with recent deployments using the deployment timestamps overlaid on the Error Rate chart (toggle Show Deployments in the chart header).
The Logs search is most effective when you have a trace ID. Every API response includes an X-Trace-Id header — ask customers to provide it when reporting issues.

Monitoring