Industrial Equipment Diagnostics RAG

TL;DR: A two-LLM diagnostic chatbot for manufacturing environments. Technicians describe a fault; the system asks up to 2 clarifying questions, runs a 6-stage parallel RAG search across 600+ OEM manuals and 12,000+ service bulletins, and returns exactly 3 ranked hypotheses with verified source citations. Built on AWS SageMaker (Llama 3.1 8B), Qdrant Cloud, PostgreSQL, and Next.js. Our In-house Tech Magazine Is Trusted by 100K+ on LinkedIn

Partners, Mentions and Clients

1. Product Overview

What RAG Does

RAG is an AI-powered industrial equipment diagnostic chatbot designed for manufacturing technicians, maintenance personnel, and plant operators who need to understand what might be wrong with industrial machinery before dispatching a specialist or halting a production line. The system simulates the intake conversation a skilled maintenance engineer or OEM field service representative would conduct — gathering relevant details about the fault, then providing an informed (but appropriately cautious) diagnostic assessment.

When a user enters the application, they follow this flow:

Equipment Selection: The user selects their equipment’s manufacturer, equipment type, model, and asset tag/serial range from cascading dropdown menus populated from a database of industrial asset records. They also enter their plant/facility code for service team recommendations.
Fault Description: The user describes the equipment fault in natural language (e.g., “Our CNC machining center spindle makes a high-pitched whine at high RPM”).
Intelligent Questioning: The AI evaluates whether it has enough information to form a diagnosis. If not, it asks targeted clarifying questions — about operating conditions, runtime hours, recent maintenance events, or observable symptoms — to avoid frustrating the technician.
Knowledge-Backed Diagnosis: When the AI decides it has sufficient information, it searches across multiple knowledge bases (OEM technical manuals, OSHA/NFPA safety bulletins, parts databases, and historical work order records) to build an evidence-backed diagnostic assessment.
Three Hypotheses: Every diagnosis presents exactly three possible root causes ranked by likelihood. Each hypothesis identifies the affected system or subsystem and cites a verified source document.
Service Team Recommendations: The system scores and recommends qualified internal maintenance teams or certified third-party service providers based on their specializations, certifications, ratings, and relevance to the diagnosed fault. Teams are displayed on an interactive facility map view alongside the chat.

Key Design Principles

These principles are enforced through the prompt system and response parsing logic. They represent deliberate product decisions, not just technical preferences.

System-Level Language Only

The AI is explicitly prohibited from naming specific components (e.g., “angular contact bearing 7208,” “servo drive IGBT module,” “proximity sensor NPN output”). Instead, it identifies the system area affected (e.g., “spindle drive system concern,” “hydraulic pressure circuit issue,” “PLC I/O subsystem fault”). The prompt templates contain an explicit list of forbidden part numbers and component names.

The reasoning is both legal and practical: a remote AI system cannot physically inspect equipment, so naming specific components could create liability if the diagnosis is wrong or if a technician replaces the wrong part. The on-site maintenance engineer determines the exact failed component during hands-on inspection with proper test equipment.

Non-Deterministic Language

All diagnostic language uses hedging phrases like “This may indicate…”, “This could be caused by…”, and “Based on the fault symptoms described, this is consistent with…”. The system never makes definitive statements about what is wrong — only what might be wrong.

This is enforced in the system prompt template, which explicitly instructs the LLM to use cautious phrasing and includes examples of acceptable vs. unacceptable language patterns.

Exactly 3 Diagnostic Hypotheses

Every full diagnosis produces exactly three possible root causes, ranked by likelihood. This is a firm product requirement — not two, not four, always three. Each hypothesis must identify a system area, a possible cause, supporting reasoning, and a verified source citation.

The one exception is acknowledgment messages. When a user sends a short message like “thanks,” “ok,” or “got it,” the system returns a brief, friendly closing instead of generating a diagnosis. This is detected through pattern matching against the ai_acknowledgment_patterns database table.

Maximum 2 Clarifying Questions

The system tracks how many clarifying question rounds have occurred in each session (stored as clarifying_count in DynamoDB session metadata). Once 2 rounds have been asked, the system is forced to generate a diagnosis with whatever information it has, even if the information is incomplete.

When forced to diagnose with limited data, the system adjusts its confidence level to LOW and explicitly states that the diagnosis is based on limited information. This prevents the system from appearing evasive or unhelpful — especially important in manufacturing environments where downtime is costly.

AI-Driven Category Selection

Fault categories (112 categories in a 2-level hierarchy like “Rotating Equipment / Bearing Failures”) are selected through semantic vector search against the fault_categories Qdrant collection, not through hardcoded keyword matching. This means the system can correctly categorize faults it hasn’t been explicitly programmed for, as long as the category descriptions are semantically similar to the technician’s description.

Graceful Handling of Limited Data

When the RAG search returns few or no relevant documents, the system doesn’t fabricate information. Instead, it adjusts the confidence level to LOW, uses more cautious language, and relies on the LLM’s general industrial maintenance knowledge while clearly indicating that specific documentation was not available.

User Flow Diagram

Step 1: User selects equipment (Manufacturer / Type / Model) and enters Plant/Facility Code

↓

Step 2: User describes the equipment fault or symptom

↓

Step 3: Acknowledgment check: Is this just “thanks” or “ok”?

YES — Acknowledgment	NO — Real Question
Return friendly closing response	Decision LLM evaluates: enough info to diagnose?

Need More Info	Ready to Diagnose
Ask clarifying question (max 2 rounds). User responds → Loop back to Decision LLM	1. Perform 6-stage parallel RAG search. 2. Build diagnosis prompt with RAG context. 3. Generate diagnosis with 3 hypotheses. 4. Score & recommend qualified service teams. 5. Stream response via SSE to user.

Stop diagnosing by gut. Start diagnosing by data. Get Your AI Readiness Assessment →

2. System Architecture

High-Level Architecture

The system consists of three tiers: a Next.js frontend, a Python FastAPI backend, and a set of external services. The frontend and backend are completely independent codebases that communicate via HTTP APIs, enabling separate deployment and scaling.

Frontend	Backend	External Services
Next.js \| Port 5000	Python/FastAPI \| Port 8000	Cloud Infrastructure
• Equipment search	• AI diagnostic engine	• AWS SageMaker (Llama 3.1 8B)
• Chat interface	• RAG pipeline	• Qdrant Cloud
• Service team map/cards	• Team scoring	• AWS DynamoDB
• Team details	• Session management	• PostgreSQL
• API route proxy	• Data ingestion

Frontend → Backend → External Services

Technology Stack

Layer	Technology	Purpose
Frontend Framework	Next.js 16 (App Router)	Server-side rendering, file-based routing
UI Components	Shadcn/ui + Radix UI	Accessible, styled component library
Styling	Tailwind CSS	Utility-first CSS with dark/light theme
State Management	TanStack React Query	Server state caching and synchronization
Forms	React Hook Form + Zod	Form handling with schema validation
Maps	Google Maps JavaScript API	Facility/service team location visualization
Backend Framework	FastAPI (Python)	High-performance async API server
LLM	Meta Llama 3.1 8B Instruct	Hosted on AWS SageMaker
Embeddings	FastEmbed (BAAI/bge-small-en-v1.5)	384-dimension text embeddings
Vector Database	Qdrant Cloud	Semantic search across 8 collections
Relational Database	PostgreSQL (Neon)	Equipment catalog, service teams, work orders, AI config
Chat Storage	AWS DynamoDB	Session-based conversation history
Dev Orchestrator	Node.js (child_process)	Runs Next.js + Python together in development

How the Frontend and Backend Communicate

The frontend never calls the Python backend directly from the browser. Instead, Next.js API routes (located in frontend/app/api/) act as a thin proxy layer. When the browser makes a request to /api/chat/stream, it hits a Next.js API route, which reads the BACKEND_URL environment variable (defaults to http://localhost:8000) and forwards the request to the Python backend.

This proxy pattern serves three purposes:

Security: The backend URL is never exposed to the browser.
CORS avoidance: Since the frontend and backend appear to be on the same origin from the browser’s perspective, no CORS configuration is needed.
Independent deployment: The frontend can be deployed to Vercel/Netlify while the backend runs on AWS/Railway/Render. Only the BACKEND_URL variable needs to change.

Development Mode

In development, npm run dev runs server/index.ts, which uses Node.js child_process to spawn two processes:

The Python backend (uvicorn main:app –port 8000 –reload) with auto-reload enabled
The Next.js frontend (next dev –port 5000) after a 3-second delay to allow the backend to initialize

There are no shared runtime dependencies between the two — they communicate purely over HTTP.

3. AI Diagnostic Engine

This is the core of RAG. The AI engine determines what to ask, when to diagnose, what sources to cite, and which service teams to recommend. Understanding this section is essential for maintaining or extending the system.

Two-LLM Architecture

The system uses two separate calls to the same Llama 3.1 8B Instruct model, but with different parameter configurations. This separation is deliberate: routing decisions need to be fast, deterministic, and predictable, while diagnostic text generation needs to be creative, detailed, and natural-sounding.

Decision LLM (Routing)

The Decision LLM’s sole job is to decide whether the system has enough information to generate a diagnosis, or whether it should ask another clarifying question.

Temperature: 0.3 — Low temperature makes the output more deterministic
Max Tokens: 200 — The response only needs to contain a JSON object with an action and optionally a question
Output Format: JSON object with action: “QUESTION” or action: “DIAGNOSIS”, plus an optional question field

The Decision LLM follows a mandatory 3-step reasoning process:

Step 1 — Extract All Info: List everything the technician has already provided, including implicit information. For example, if a user says “there’s smoke coming from the motor housing,” the Decision LLM should recognize that “location” (motor housing) and “severity indicator” (visible smoke = critical) have already been provided implicitly.
Step 2 — Check Objectives: Compare the extracted information against diagnostic objectives defined in diagnostic_procedures.yaml. If the user’s symptoms match a known procedure (e.g., “vibration_fault”), the Decision LLM checks which objectives from that procedure have been satisfied.
Step 3 — Decision: Proceed to DIAGNOSIS if all diagnostic objectives are met, OR if 2+ clarifying questions have already been asked, OR if the query is a scheduled maintenance request. Ask a QUESTION only if the system is under the question limit and genuinely needs specific missing information.

Diagnosis LLM (Response Generation)

The Diagnosis LLM generates the actual user-facing diagnostic text, including the three hypotheses, severity assessment, and service team recommendations.

Temperature: 0.7 — Moderate creativity for natural, varied language
Max Tokens: 1500 — Enough for a full diagnostic response with all required sections
Output Format: A structured text response with specific tag delimiters

The Diagnosis LLM receives a much larger prompt than the Decision LLM, including the full system prompt with all behavioral rules, RAG context from 6 parallel searches (OEM manuals, OEM service bulletins, parts data, etc.), admin feedback rules, available service team profiles, and the last N messages of conversation history.

Diagnosis Output Format

The Diagnosis LLM produces a response that follows a strict structure with tagged sections. The backend’s ResponseParser extracts structured data from these tags:

|||BACKEND_START|||

DIAGNOSTIC_APPROACH: [Brief description of the analytical method used]

HYPOTHESIS_1: [System Area] | [Possible Cause] | [Supporting Reasoning] | [Source Citation]

HYPOTHESIS_2: [System Area] | [Possible Cause] | [Supporting Reasoning] | [Source Citation]

HYPOTHESIS_3: [System Area] | [Possible Cause] | [Supporting Reasoning] | [Source Citation]

SOURCES: [Comma-separated list of all sources cited]

|||BACKEND_END|||

[1-2 sentence user-facing diagnosis summary using cautious language]

Matching you now with qualified service teams.

|||SEVERITY:low/medium/high|||URGENCY:immediate/soon/can_wait|||TEAMS:id1,id2,id3|||

|||TEAM_REASON:id:reason why this team was recommended|||

|||DOC_REFS:Document Title::Document Type;;Document Title::Document Type|||

|||CATEGORY:Category/Subcategory:CONFIDENCE_LEVEL|||

The |||BACKEND_START|||…|||BACKEND_END||| block contains diagnostic reasoning that the frontend shows inline. The metadata tags after the user-facing text are parsed by ResponseParser and stripped from the displayed message. They provide structured data for the frontend’s diagnostic card, service team recommendations, and category tracking.

How the Response Parser Works

The ResponseParser class in backend/services/response_parser.py uses regular expressions to extract structured data from the LLM’s free-form output. It extracts backend reasoning, metadata (severity, urgency, recommended team IDs), team-specific recommendation reasons, document references, and fault category with confidence level. The clean_content method strips all metadata tags from the user-facing text and removes common LLM artifacts.

Valid Source Citations

The AI prompt strictly limits which sources can be cited in hypothesis fields. Only these are permitted:

OEM Technical Manual — [Manufacturer] [Equipment Series] (only manuals actually retrieved from RAG search and present in the prompt context)
ISO 13849 / ISO 62061 Safety Standards Reference
OSHA 29 CFR 1910 Machine Guarding Standards
NFPA 70E Electrical Safety in the Workplace
IEC 60204-1 Safety of Machinery — Electrical Equipment
OEM Service Bulletin #[document number]

Internal guidance (admin feedback), diagnostic procedures, and any other context are explicitly marked as NOT A SOURCE in the prompt and must never be cited.

6-Stage RAG Search Architecture

When the Decision LLM routes to DIAGNOSIS, the system performs 6 parallel searches across different Qdrant collections using Python’s ThreadPoolExecutor(max_workers=6):

Stage	Internal Key	Collection	What It Searches	How It Filters
1	stage1_equipment	equipment_repair_documents	OEM technical manual pages relevant to the specific equipment	Filtered by equipment manufacturer; score threshold 0.35
2	stage2_oem_bulletins	oem_bulletin_documents	OEM Service Bulletins and manufacturer field notices for known faults	Filtered by manufacturer, equipment model, and production year for exact matches
3	stage3_symptom	equipment_repair_documents	OEM documents related to reported fault symptoms	Uses symptom-specific keyword queries from DynamicFaultClassifier; threshold 0.3
4	stage4_component	equipment_repair_documents	OEM documents about specific equipment subsystems	Uses subsystem-specific queries from the classifier; threshold 0.3
5	stage5_parts	parts_encyclopedia	Parts information, specifications, and maintenance guides	Semantic search using the user’s primary query
6	stage6_categories	fault_categories	Fault category classification	Semantic matching using the raw user query

After the 6-stage search completes, main.py makes a separate call to retrieve work order cases from the work_order_cases collection (historical repair records), service team profiles from the team_profiles collection, and fallback documents from the general documents collection if fewer than 3 results came back from the staged search.

The DynamicFaultClassifier

The DynamicFaultClassifier (in backend/fault_classifier.py) analyzes the technician’s query and generates optimized search queries for the different RAG stages by performing semantic search against the fault_categories Qdrant collection.

Its build_rag_queries() method produces three sets of search queries:

Equipment-Specific Queries: Created when manufacturer/model are provided. Examples: “Siemens SINAMICS S120 drive fault codes,” “Fanuc 30i CNC spindle alarm troubleshooting.”
Symptom-Specific Queries: Derived from detected symptom keywords and the matched category. Examples: “high frequency vibration rotating equipment troubleshooting.”
Subsystem-Specific Queries: Based on the identified subsystem from the category match. Examples: “hydraulic pressure control valve repair procedure.”

Response Path Summary

Path	Trigger	Processing	Service Team Recommendations?
Acknowledgment	User sends a short message matching a pattern in ai_acknowledgment_patterns (e.g., “thanks”, “ok”, “got it”)	No LLM call at all. A random response is selected from the ai_acknowledgment_responses table.	No
Clarifying Question	Decision LLM returns action: “QUESTION”	Single LLM call (Decision LLM only). No RAG search performed.	No
Full Diagnosis	Decision LLM returns action: “DIAGNOSIS”, OR the clarifying question limit (2) has been reached	Two LLM calls (Decision + Diagnosis), 6-stage RAG search, knowledge service lookup, team scoring, full response parsing	Yes

Streaming Response

Chat responses are delivered to the frontend via Server-Sent Events (SSE). The streaming flow: the frontend sends a POST to /api/chat/stream → Next.js proxies to the Python backend → Python generates via SageMaker’s streaming API → tokens sent as SSE events with type: “token” → a final type: “complete” event with full structured response including parsed metadata, diagnostic info, recommended teams, and detected categories.

Service Team Scoring System

When a full diagnosis is generated, the system scores available service teams to determine which ones to recommend. The TeamScorer class in backend/services/team_scorer.py handles this with a multi-factor scoring approach.

Step 1 — Candidate Pool: Service teams are fetched from PostgreSQL filtered by the user’s plant/facility code (matching the facility region for proximity). GPS coordinates calculate distance.

Step 2 — AI Selection: The Diagnosis LLM may include team IDs in its |||TEAMS:id1,id2,id3||| metadata tag, along with per-team reasons in |||TEAM_REASON:id:reason||| tags.

Step 3 — Specialization Scoring: Each team’s specializations are compared against keywords derived from the user’s query and the detected fault category. The TeamScorer maintains a SPECIALIZATION_KEYWORDS mapping (e.g., “hydraulics” maps to keywords like “hydraulic,” “valve,” “cylinder,” “pump,” “actuator”) and calculates a match score. Teams that only specialize in unrelated areas (e.g., an electrical-only team for a hydraulic fault) may receive a penalty.

Step 4 — Vector Similarity: Team profiles from the Qdrant team_profiles collection provide a semantic similarity score between the team’s description/specializations and the technician’s fault description.

Step 5 — Combined Ranking: The final score combines specialization matching, vector similarity, and AI-provided reasons. Teams are sorted by this combined score in descending order.

Each recommended team includes a recommendation_reason explaining why it was selected (e.g., “Specializes in CNC spindle drive systems, OEM certified Fanuc technicians, 4.8 rating, average 2-hour response time”).

Stop diagnosing by gut. Start diagnosing by data. Get Your AI Readiness Assessment →

4. Data Pipeline & Knowledge Base

The quality of RAG’s diagnoses depends entirely on the quality and coverage of its knowledge base. This section explains every data source, how it’s ingested, and how it flows into the RAG search pipeline.

Data Sources Overview

Source	Format	Approximate Count	Description
OEM Technical Manuals	PDF (pre-processed)	~600+ document chunks	Original equipment manufacturer service and maintenance manuals covering all major industrial equipment systems: CNC machines, PLCs, servo drives, hydraulics, pneumatics, conveyors, compressors, and more.
OEM Service Bulletins	CSV / XML	~12,000+ documents	Field service bulletins, engineering change notices, and manufacturer-issued corrective action notices from major industrial OEMs (2018–2025).
ISO / OSHA / NFPA Standards	PDF (pre-processed)	Included in count above	Applicable safety and engineering standards referenced during diagnosis. Covers machinery guarding, electrical safety, functional safety, and lockout/tagout requirements.
Equipment Catalog	Excel/DB	6,200+ records	Comprehensive industrial equipment specifications including manufacturer, equipment type, model, production year, power rating, control system type, fluid type, and warranty information.
Diagnostic Procedures	YAML	Variable	Fault-based diagnostic decision trees that guide the AI’s questioning strategy. Defined in diagnostic_procedures.yaml with 8 fault symptom types.
Parts Encyclopedia	Loader script	Variable	Parts information including part names, descriptions, associated subsystems, and maintenance specifications.
Fault Categories	Loader + sync	112 categories	A 2-level hierarchy of fault types (e.g., “Rotating Equipment / Bearing Failures”) used for automatic categorization.
Service Team Profiles	Seed/API	Variable	Maintenance team and service provider data including name, location, specializations, OEM certifications, ratings, and availability.

Embedding Model

All text is converted to vector embeddings using the same model for consistency:

Model: BAAI/bge-small-en-v1.5, loaded via the FastEmbed library
Vector Dimensions: 384
Distance Metric: Cosine similarity
Running Location: Locally on the backend server (no API calls needed for embedding generation)
Score Thresholds: Range from 0.25 to 0.35 depending on the collection

Using a local embedding model means embedding generation is free, fast, and always available. The BAAI/bge-small-en-v1.5 model performs well for industrial maintenance domain text at this scale.

Equipment Manufacturer Filtering

During OEM Service Bulletin data ingestion, non-industrial and consumer equipment manufacturers are automatically filtered out. This prevents the knowledge base from being polluted with documents about consumer appliances, HVAC residential units, or automotive components outside RAG’s industrial scope.

The exclusion list is maintained in backend/config/excluded_manufacturers.json and is applied by both the manufacturer communications loader and the service bulletin loader.

5. Backend API Reference

The Python FastAPI backend exposes a RESTful API. In development, it runs on port 8000. In production, the URL is set via the BACKEND_URL environment variable.

Chat Endpoints (defined in main.py)

Method	Path	Description
POST	/api/chat	Send a message and receive a complete JSON response (non-streaming). Used for testing and debugging.
POST	/api/chat/stream	Send a message and receive a streaming SSE response. This is what the frontend uses.
GET	/api/chat/{session_id}	Retrieve the full chat history for a given session from DynamoDB.

POST /api/chat/stream — Request Body:

{

“query”: “Our CNC machining center spindle makes a grinding noise at high RPM”,

“manufacturer”: “Fanuc”,

“equipment_type”: “CNC Machining Center”,

“model”: “ROBODRILL D21MiB5”,

“facility_code”: “PLT-042”,

“session_id”: “optional-uuid-for-conversation-continuity”

}

If session_id is omitted, a new UUID is generated. Providing the same session_id across messages enables multi-turn conversation with history.

Equipment Endpoints (/api/equipment)

These endpoints power the cascading equipment selector dropdowns on the home page. They query the equipment_catalog and equipment_options PostgreSQL tables.

Method	Path	Description
GET	/api/equipment	List equipment with optional filters (manufacturer, type, model)
POST	/api/equipment	Create an equipment record (used during chat session initialization)
GET	/api/equipment/manufacturers	Get all distinct manufacturers available in the catalog
GET	/api/equipment/types?manufacturer=Fanuc	Get all equipment types for a specific manufacturer
GET	/api/equipment/models?manufacturer=Fanuc&type=CNC	Get all models for a manufacturer + type combination
GET	/api/equipment/variants?manufacturer=Fanuc&type=CNC&model=ROBODRILL	Get available model variants for a specific equipment entry

Service Team Endpoints (/api/teams)

Method	Path	Description
GET	/api/teams?facility_code=PLT-042	List service teams, optionally filtered by facility code or region. Can also filter by specialization and min_rating.
POST	/api/teams	Create a new service team record
GET	/api/teams/{team_id}	Get detailed information for a specific service team

Feedback Endpoints (/api/feedback)

Method	Path	Description
POST	/api/feedback	Submit admin feedback for an AI response. Triggers LLM compression and dual storage (DynamoDB + Qdrant).
GET	/api/feedback	List all feedback entries from DynamoDB for the admin panel
DELETE	/api/feedback/{feedback_id}	Delete feedback from both DynamoDB and Qdrant
PATCH	/api/feedback/{feedback_id}/archive	Archive a feedback entry (sets is_archived flag in both stores)

Work Order Endpoints (/api/work-orders)

Method	Path	Description
GET	/api/work-orders	List work orders with optional filters (team_id, equipment_manufacturer, fault_type)
POST	/api/work-orders	Create a work order record

Document Endpoints (/api/documents)

Method	Path	Description
GET	/api/documents	Search documents with query string and optional filters (type, equipment_manufacturer)
POST	/api/documents	Create a single document record
POST	/api/documents/bulk	Bulk create multiple documents in a single request
GET	/api/documents/list	List all documents (paginated)

Knowledge Base Statistics (/api/knowledge-base)

Method	Path	Description
GET	/api/knowledge-base/stats	Returns statistics about the knowledge base: total documents, documents per type, per equipment manufacturer, etc.

Health Check Endpoints (/api/health)

Method	Path	Description
GET	/api/health	Full health check — tests connectivity to PostgreSQL, Qdrant, and DynamoDB. Returns detailed status for each service.
GET	/api/health/live	Liveness probe — returns 200 if the server process is running.
GET	/api/health/ready	Readiness probe — returns 200 if all database connections are established and ready to serve requests.

Admin: OEM Bulletin Data Import (/admin/oem-bulletins)

Method	Path	Description
GET	/admin/oem-bulletins/files	List available OEM service bulletin data files that can be imported
POST	/admin/oem-bulletins/import	Start a background import job for a specific file
GET	/admin/oem-bulletins/import/status	Check the progress of a running import job
POST	/admin/oem-bulletins/import/cancel	Cancel a running import

Stop diagnosing by gut. Start diagnosing by data. Get Your AI Readiness Assessment →

6. Frontend Application

The frontend is a Next.js 16 application using the App Router pattern. It provides two main pages and communicates with the Python backend exclusively through API route proxies.

Pages

Route	File	Description
/chat	frontend/app/chat/page.tsx	Chat page — The main diagnostic interface. Contains the ChatInterfaceWithMap component, which renders a split view: the chat panel on the left and a facility/Google Maps view on the right showing recommended service teams.
/team/[id]	frontend/app/team/[id]/page.tsx	Service team details page — Shows detailed information about a specific service team including availability, specializations, OEM certifications, past work orders, and a map with their location.

Key Components

SearchForm (search-form.tsx): The equipment selection form on the home page. It renders cascading dropdown menus (Manufacturer, Equipment Type, Model) that each trigger an API call when a selection is made. The form also includes a facility code input. On submission, the user is navigated to the chat page with all equipment info encoded in URL query parameters.

ChatInterfaceWithMap (chat-interface-with-map.tsx): The largest and most complex component, managing message state, SSE connection lifecycle, session management, diagnostic card rendering (showing hypotheses, severity, sources), service team card rendering, map integration (showing team/facility pins on Google Maps), and feedback submission.

DiagnosticCard (diagnostic-card.tsx): Renders the structured diagnostic output including the three hypotheses (each with system area, possible cause, reasoning, and source), the diagnostic approach, severity/urgency indicators, and source citations.

TeamCard (team-card.tsx): Displays a single service team recommendation with their name, rating, specializations, OEM certifications, match score, distance, and the AI’s recommendation reason.

FacilityMap (facility-map.tsx): Google Maps component that displays facility and service team locations as markers. When a marker is clicked, a TeamDetailsPopup appears with additional information.

Navigation (navigation.tsx): Top navigation bar with the RAG branding and theme toggle.

ThemeToggle (theme-toggle.tsx): Dark/light mode toggle button. User preference is persisted in localStorage.

API Route Proxies

The files in frontend/app/api/ are Next.js route handlers that forward requests to the Python backend. Each one reads the BACKEND_URL environment variable, forwards the incoming request to the corresponding backend endpoint, and returns the backend’s response to the browser.

State Management

The frontend uses a minimal state management approach: TanStack React Query handles all server data fetching with automatic caching. Component-level state (useState) manages UI state like the current message, streaming status, selected team, and feedback form visibility. Session ID is generated client-side and stored in component state.

Theme Support

The application supports dark and light modes via a ThemeProvider context, CSS variables in globals.css for both :root (light) and .dark (dark) selectors, localStorage persistence, and Tailwind CSS utility classes throughout.

7. Database Schema

RAG uses PostgreSQL as its primary relational database for structured data. The schema is defined using SQLAlchemy models in backend/database.py.

PostgreSQL Tables

equipment_catalog

The master equipment reference table, containing 6,200+ records imported from OEM datasets. Used to populate equipment selector dropdowns and provide detailed equipment specifications to the AI during diagnosis.

Column	Type	Description
id	serial (PK)	Auto-incrementing primary key
external_id	varchar(50)	External reference ID from the source dataset
manufacturer	varchar (required)	Equipment manufacturer (e.g., “Siemens”)
equipment_type	varchar (required)	Equipment type (e.g., “CNC Machining Center”)
model	varchar (required)	Model name (e.g., “SINUMERIK 840D”)
variant	varchar	Model variant or configuration level
variant_description	text	Detailed description of what the variant includes
control_system	varchar	Control system type (e.g., “Siemens SINUMERIK”, “Fanuc 30i”)
power_rating_kw	float	Equipment power rating in kilowatts
drive_type	varchar	Drive type (e.g., “AC Servo”, “Hydraulic”, “Pneumatic”)
fluid_type	varchar	Fluid type if applicable (e.g., “ISO VG 46 Hydraulic Oil”)
voltage	varchar	Operating voltage (e.g., “480V 3-Phase”)
warranty_parts	varchar	Parts warranty coverage period
warranty_labor	varchar	Labor warranty coverage period
production_year_start	integer	First production year for this model
production_year_end	integer	Last production year (null if still in production)
platform_code	varchar	Internal equipment platform identifier
source	varchar (required)	Tracks which dataset this record came from
imported_at	datetime	Timestamp when this record was imported

equipment_options

A denormalized lookup table that pre-computes the distinct manufacturer/type/model combinations available for the equipment selector.

assets

Stores equipment asset records associated with individual chat sessions. When a technician starts a chat, their equipment selection is saved here.

Column	Type	Description
id	serial (PK)	Auto-incrementing primary key
manufacturer	varchar (required)	Equipment manufacturer
equipment_type	varchar (required)	Equipment type
model	varchar (required)	Equipment model
control_system	varchar	Control system details
drive_type	varchar	Drive type
fluid_type	varchar	Fluid specification
specifications	json	Additional specifications as a JSON object

service_teams

The service team and maintenance provider directory.

Column	Type	Description
id	serial (PK)	Auto-incrementing primary key
name	varchar (required)	Team or company name
address	varchar (required)	Street address
city	varchar (required)	City
state	varchar (required)	State
facility_code	varchar (required)	Facility code used for proximity filtering
phone	varchar	Contact phone number
email	varchar	Contact email
website	varchar	Website URL
rating	float	Average performance rating (1.0 to 5.0 scale)
review_count	integer	Number of completed work orders
specializations	varchar[]	Array of specialization areas (e.g., [“CNC Servo Drives”, “Hydraulic Systems”])
certifications	varchar[]	Array of held certifications (e.g., [“Fanuc Certified”, “Siemens OEM Partner”, “OSHA 30”])
hours	json	Availability hours stored as JSON
latitude	float	GPS latitude for map display
longitude	float	GPS longitude for map display
response_time_hours	float	Average response time in hours
is_verified	boolean	Whether the team has been verified
description	text	Free-text team description
labor_rate	float	Hourly labor rate in dollars

work_orders

Historical work order records linking assets to service teams. These records feed into the work_order_cases Qdrant collection for RAG search context.

Column	Type	Description
id	serial (PK)	Auto-incrementing primary key
team_id	integer (required)	Foreign key to the service_teams table
asset_id	integer	Foreign key to the assets table (nullable)
equipment_manufacturer	varchar	Denormalized manufacturer for quick filtering
equipment_type	varchar	Denormalized equipment type
equipment_model	varchar	Denormalized equipment model
fault_type	varchar (required)	Type of fault addressed (e.g., “Spindle Drive Failure”)
description	text	Detailed description of the repair work performed
symptoms	varchar[]	Array of symptoms the technician reported
fault_codes	varchar[]	PLC/controller fault/alarm codes found during diagnosis
parts_used	varchar[]	Parts that were replaced
labor_hours	float	Hours of labor the repair required
total_cost	float	Total cost including parts and labor
completed_at	datetime	Date and time the work order was completed

documents

Metadata for knowledge base documents. The actual document content is stored both here (for reference) and as vector embeddings in Qdrant (for search).

AI Configuration Tables

ai_acknowledgment_patterns — Text patterns that indicate the user is sending an acknowledgment rather than a fault query.

ai_acknowledgment_responses — Pool of responses to randomly select from when an acknowledgment is detected.

ai_symptom_indicators — Keywords that indicate a message contains significant fault symptom information (e.g., “vibrating,” “leaking,” “tripped,” “overheating,” “alarm code”).

jobs — Represents individual job line items within a work order. The model is defined but not currently queried at runtime. Exists for potential future integration with CMMS (Computerized Maintenance Management System) platforms.

Stop diagnosing by gut. Start diagnosing by data. Get Your AI Readiness Assessment →

8. Vector Database (Qdrant)

Qdrant is the vector database that powers RAG’s semantic search capabilities. All collections use 384-dimension vectors with cosine distance.

Collections

Collection	Used By	Purpose	Key Payload Fields
equipment_repair_documents	RAG Stages 1, 3, 4	OEM technical manual content. The primary knowledge base — chunked pages from professional industrial maintenance manuals covering all equipment systems.	title, content, source, equipment_manufacturer, equipment_model
oem_bulletin_documents	RAG Stage 2	OEM Service Bulletins and manufacturer field notices for known faults. Contains equipment-specific known issues and official corrective action instructions.	title, content, manufacturer, model, production_year, bulletin_id
parts_encyclopedia	RAG Stage 5	Parts information including names, descriptions, associated subsystems, and maintenance specifications.	part_name, description, system, category
fault_categories	RAG Stage 6, Fault Classifier	The 112-category fault hierarchy used for automatic categorization.	category, parent, description, level
team_profiles	search_all_collections()	Vectorized service team profiles enabling semantic matching of team capabilities to technician fault descriptions.	team_id, name, specializations, city, state
work_order_cases	search_all_collections()	Historical work order records stored as vectors. When a technician describes a fault, the system finds similar past work orders for additional context.	description, equipment_manufacturer, equipment_model, symptoms
admin_feedback	Feedback retrieval in main.py	Admin corrections stored as vectors for semantic retrieval. During diagnosis, the system finds feedback relevant to the current query and equipment type.	feedback_id, initial_query, feedback_text, concise_rule, equipment_manufacturer, equipment_model, submitted_by, is_archived
diagnostic_knowledge	KnowledgeService	Fault-based diagnostic procedures from diagnostic_procedures.yaml, stored as vectors. Matched by fault type and equipment attributes using specificity scoring.	fault_type, procedure, manufacturer, model, drive_type, control_system, power_class

Collection Initialization

Collections are not created at application startup. They are created on-demand when data is first inserted via the _ensure_collection method in qdrant_service.py. The system degrades gracefully — if a collection doesn’t exist during a search, the search returns empty results rather than crashing.

Search Patterns

Filtered search: Most collections support metadata filters. Stage 2 (OEM Bulletins) filters by manufacturer, model, and production year to find bulletins specific to the technician’s exact equipment. Stage 1 filters by manufacturer to find relevant manual sections.

Score thresholds: Each stage has a minimum score threshold (0.25 to 0.35). Higher thresholds (0.35 for Stage 1) prioritize precision; lower thresholds (0.25–0.3 for other stages) cast a wider net.

Specificity scoring (diagnostic_knowledge): After retrieving candidate procedures from Qdrant, each one is scored based on how many of its non-null fields match the technician’s equipment context. A procedure that matches on fault_type + equipment_manufacturer + drive_type scores higher than one that only matches on fault_type. A mismatch on any field results in a score of -1, effectively excluding it.

9. Configuration & Prompt System

All AI behavior in RAG is controlled through YAML configuration files stored in backend/config/. This design allows maintenance engineers and domain experts to tune the AI’s behavior without touching Python code, and makes all configuration version-controllable through Git.

Configuration Files

ai_prompt_templates.yaml

Template Key	Purpose	How It’s Used
system_prompt	The master system prompt for the Diagnosis LLM	Contains all behavioral rules, output format instructions, source citation rules, and placeholders for dynamic content (equipment info, RAG context, teams, feedback, procedures)
decision_prompt	The prompt for the Decision LLM	Defines the 3-step reasoning process and specifies the JSON output format
llama_tokens	Llama 3.1 special tokens	Token markers used for prompt formatting
error_response	Fallback error message	Displayed to the user when the AI service encounters an unrecoverable error
default_acknowledgment_response	Default acknowledgment reply	Used as fallback if the ai_acknowledgment_responses database table is empty
feedback_formatting.system	System prompt for feedback compression	Instructs the LLM to compress admin feedback into a concise rule in maintenance shorthand (max 25 words)
feedback_formatting.user	User prompt for feedback compression	Template with {feedback_text} placeholder

ai_model_settings.yaml

Setting Group	Parameters	When Used
diagnosis	max_tokens: 1500, temperature: 0.7, top_p: 0.9	Non-streaming diagnostic response generation
decision	max_tokens: 200, temperature: 0.3, top_p: 0.9	Decision LLM routing (QUESTION vs. DIAGNOSIS)
streaming	max_tokens: 1500, temperature: 0.7, top_p: 0.9, stream: true	Streaming diagnostic response generation
retry	max_retries: 3, base_delay: 1.0 seconds	SageMaker retry configuration for transient failures
limits	chat_history_window: 10, max_clarifying_questions: 2, max_acknowledgment_words: 5	Behavioral limits

diagnostic_procedures.yaml

This file defines fault-based diagnostic decision trees that guide the Decision LLM’s questioning strategy. There are 8 fault types, each with its own diagnostic procedure:

Fault Type	What It Covers	Example User Messages
vibration	Abnormal vibration or oscillation	“Our pump is vibrating excessively at startup”
noise	Unusual sounds from equipment	“The spindle makes a high-pitched whine at high RPM”
thermal	Overheating or thermal faults	“The servo drive is tripping on over-temperature”
fluid_leak	Visible hydraulic, lubricant, or coolant loss	“There’s oil pooling under the hydraulic power unit”
electrical_fault	Electrical alarms, tripped breakers, control faults	“The PLC is showing an E-stop circuit fault”
no_start	Equipment won’t start, won’t cycle, stalls	“The conveyor won’t start after the power outage”
performance_degradation	Output below spec, slow cycle times, quality issues	“The press cycle time has increased by 30%”
maintenance_request	Scheduled preventive maintenance	“We need to do the 2000-hour PM on the compressor”

Each procedure defines diagnostic objectives, relevant questions, and decision criteria.

How Prompts Are Built

The PromptBuilder class in backend/services/prompt_builder.py orchestrates the assembly of the final LLM prompt. The RAG context from the 6-stage search is formatted with OEM bulletins prioritized first. All formatted sections are injected into the system_prompt template via placeholder substitution: {equipment_manufacturer}, {equipment_type}, {equipment_model}, {facility_code}, {context_text}, {category_text}, {work_order_cases_text}, {team_profiles_text}, {teams_text}, {feedback_text}, and {procedure_content}.

The system prompt, chat history, and current user query are then assembled into the Llama 3.1 instruction format using special tokens.

Stop diagnosing by gut. Start diagnosing by data. Get Your AI Readiness Assessment →

10. Admin Feedback System

The admin feedback system enables continuous improvement of the AI’s diagnostic accuracy without requiring model fine-tuning or retraining. When a senior maintenance engineer or OEM specialist notices the AI making an incorrect assessment, they can submit a correction that will automatically be applied to future diagnoses for similar queries.

Why This Approach?

Traditional approaches to improving AI accuracy — fine-tuning the model on corrected data — are expensive, time-consuming, and require significant ML infrastructure. RAG uses a RAG-based approach instead: corrections are stored as searchable vectors and retrieved when semantically similar queries arise. This provides immediate effect (corrections apply to the next conversation), full auditability, reversibility (corrections can be deleted without affecting the base model), and requires no ML expertise from the feedback submitter.

End-to-End Flow

Step 1 — Submission: A maintenance engineer or OEM specialist views an AI diagnostic response in the chat interface and clicks the feedback button. They select a feedback category (“Follow-up Questions” or “Diagnosis”), write a correction in natural language, and submit. The form automatically captures the original query, the AI’s response being corrected, equipment information, conversation context, and who submitted it.

Step 2 — LLM Compression: The raw feedback text is sent to the LLM with a special compression prompt that compresses verbose feedback into a concise rule in maintenance shorthand (max 25 words). For example: “Amber oil mist from motor vent = bearing lubrication loss, not coolant system. Check lube lines first.”

Step 3 — Dual Storage: The full feedback record is stored in DynamoDB (complete audit trail), and the feedback text is embedded as a vector in Qdrant’s admin_feedback collection with the compressed rule and equipment metadata as payload fields. Equipment manufacturer and model are always stored in UPPERCASE for consistent filtering.

Step 4 — Retrieval During Diagnosis: When the system generates a new diagnosis, it searches the admin_feedback Qdrant collection for the top 3 most relevant feedback items, filtered by equipment manufacturer to ensure relevance.

Step 5 — Prompt Injection: Retrieved feedback rules are injected into the system prompt under the heading INTERNAL GUIDANCE (NOT A SOURCE). Each rule appears as a bullet point.

Step 6 — Silent Application: The Diagnosis LLM reads the injected feedback rules and applies them to its reasoning, citing the relevant OEM manual or standard as the source — never the admin feedback itself.

Managing Feedback

GET /api/feedback returns all feedback for admin review. DELETE /api/feedback/{id} removes a correction from both DynamoDB and Qdrant. PATCH /api/feedback/{id}/archive excludes a correction from retrieval while preserving the record. python backend/data_ingestion/backfill_concise_rules.py re-runs LLM compression on all existing feedback.

11. Environment Variables & Secrets

Required Variables

Variable	Description	Used By
DATABASE_URL	PostgreSQL connection string	Backend — SQLAlchemy database connection
AWS_ACCESS_KEY_ID	AWS IAM access key	Backend — SageMaker LLM inference + DynamoDB
AWS_SECRET_ACCESS_KEY	AWS IAM secret key	Backend — paired with access key above

Optional Variables (with defaults)

Variable	Default	Description
AWS_REGION	us-east-1	AWS region for SageMaker endpoint and DynamoDB table
SAGEMAKER_ENDPOINT_NAME	meta-llama-3-1-8b-instruct-012205	Name of the SageMaker inference endpoint
QDRANT_URL	(none)	URL of your Qdrant Cloud instance. If not set, vector search features are disabled.
QDRANT_API_KEY	(none)	Authentication key for Qdrant Cloud. Required if QDRANT_URL is set.
BACKEND_URL	http://localhost:8000	The Python backend URL, used by Next.js API routes to proxy requests.
VITE_GOOGLE_MAPS_API_KEY	(none)	Google Maps JavaScript API key for the map component
NEXT_PUBLIC_GOOGLE_MAPS_API_KEY	(none)	Same as above but exposed to the Next.js client bundle
SESSION_SECRET	(none)	Secret key for session encryption

Environment Validation

On startup, env_validator.py checks that all required environment variables are set, logs warnings (not errors) for missing optional variables, and supports both DATABASE_URL format and individual PostgreSQL variables (PGHOST, PGPORT, PGUSER, PGPASSWORD, PGDATABASE). The validator runs in non-strict mode — features that depend on missing variables degrade gracefully.

12. Deployment Guide

Architecture: Independent Deployment

The frontend and backend are designed to be deployed completely independently. There is no shared server process, no shared filesystem, and no shared configuration beyond the BACKEND_URL variable.

Frontend Deployment

Set BACKEND_URL to your deployed backend’s URL (e.g., https://api.industrialrag.com)
Set NEXT_PUBLIC_GOOGLE_MAPS_API_KEY for maps functionality
Build the application: cd frontend && npm run build
Start the production server: cd frontend && npm start

Backend Deployment

Set all required environment variables
Set optional variables (QDRANT_URL, QDRANT_API_KEY) for vector search
Install Python dependencies: pip install -r requirements.txt
Start the server: uvicorn main:app –host 0.0.0.0 –port 8000

For production: gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker –bind 0.0.0.0:8000

Development Mode

npm run dev — executes server/index.ts which starts the Python backend on port 8000 with –reload, waits 3 seconds, then starts Next.js on port 5000.

Health Checks

Liveness: GET /api/health/live — Returns 200 if the server process is running.
Readiness: GET /api/health/ready — Returns 200 if all database connections are established.

Full Health: GET /api/health — Returns detailed JSON status of each dependency.

Stop diagnosing by gut. Start diagnosing by data. Get Your AI Readiness Assessment →

13. Data Ingestion Scripts

All data ingestion scripts are in backend/data_ingestion/ and are designed to be run manually from the command line.

Equipment Catalog Import

python backend/data_ingestion/equipment_catalog_loader.py <excel_file.xlsx> –source <source_name>

OEM Manufacturer Communications

python backend/data_ingestion/oem_comms_loader.py <csv_file>

Parses an OEM manufacturer communications CSV, filters out non-industrial equipment manufacturers, generates text embeddings, and stores in the oem_bulletin_documents Qdrant collection.

OEM Service Bulletins

python backend/data_ingestion/service_bulletin_loader.py <tsv_file>

Same pipeline as above, but parses tab-separated service bulletin files with detailed corrective action instructions.

Diagnostic Knowledge Procedures

python backend/data_ingestion/knowledge_loader.py –clear

Loads diagnostic procedures from backend/config/diagnostic_procedures.yaml into the diagnostic_knowledge Qdrant collection. The –clear flag removes all existing entries before loading.

Parts Encyclopedia

python backend/data_ingestion/parts_encyclopedia_loader.py

Fault Categories

python backend/data_ingestion/fault_categories_loader.py

python backend/data_ingestion/fault_categories_qdrant_sync.py

A two-step process: the first script loads 112 fault categories into PostgreSQL; the second reads them, generates embeddings, and syncs to the fault_categories Qdrant collection.

Full Environment Seed (Development)

python backend/data_ingestion/seed_all.py

Seeds a new development environment with all base data: service teams, sample equipment assets, AI configuration, and diagnostic enrichment data.

Admin Feedback Backfill

python backend/data_ingestion/backfill_concise_rules.py [–dry-run]

Re-runs the LLM compression step on all existing admin feedback entries. The –dry-run flag shows what would change without actually updating records.

14. Directory Structure

IndustrialRAG-Frontend/

├── app/

│ ├── api/

│ │ ├── chat/

│ │ │ └── stream/route.ts

│ │ ├── feedback/

│ │ │ ├── route.ts

│ │ │ └── [id]/route.ts

│ │ ├── teams/

│ │ │ ├── route.ts

│ │ │ └── [id]/route.ts

│ │ └── equipment/

│ │ ├── manufacturers/route.ts

│ │ ├── types/route.ts

│ │ ├── models/route.ts

│ │ └── variants/route.ts

│ │

│ ├── chat/page.tsx

│ ├── team/[id]/page.tsx

│ ├── components/

│ │ ├── chat-interface-with-map.tsx

│ │ ├── search-form.tsx

│ │ ├── diagnostic-card.tsx

│ │ ├── team-card.tsx

│ │ ├── facility-map.tsx

│ │ ├── team-details-popup.tsx

│ │ ├── navigation.tsx

│ │ ├── theme-toggle.tsx

│ │ └── ui/

│ │

│ ├── hooks/

│ ├── lib/

│ ├── providers/

│ ├── globals.css

│ ├── layout.tsx

│ └── page.tsx

│

├── public/

├── package.json

├── next.config.mjs

├── tailwind.config.ts

├── tsconfig.json

├── .gitignore

└── README.md

IndustrialRAG-Backend/

├── main.py

├── ai_service.py

├── qdrant_service.py

├── dynamo_service.py

├── database.py

├── fault_classifier.py

├── config.py

├── models.py

├── error_handlers.py

├── env_validator.py

│

├── routers/

│ ├── health.py

│ ├── equipment.py

│ ├── teams.py

│ ├── work_orders.py

│ ├── documents.py

│ ├── knowledge_base.py

│ ├── feedback.py

│ └── oem_bulletin_admin.py

│

├── services/

│ ├── ai_config_service.py

│ ├── prompt_builder.py

│ ├── response_parser.py

│ ├── knowledge_service.py

│ ├── team_scorer.py

│ └── seed.py

│

├── config/

│ ├── ai_prompt_templates.yaml

│ ├── ai_model_settings.yaml

│ ├── diagnostic_procedures.yaml

│ ├── facility_coordinates.json

│ └── excluded_manufacturers.json

│

├── data/

│ └── attached_assets/

│

├── data_ingestion/

│ ├── seed_all.py

│ ├── oem_comms_loader.py

│ ├── service_bulletin_loader.py

│ ├── knowledge_loader.py

│ ├── equipment_catalog_loader.py

│ ├── parts_encyclopedia_loader.py

│ ├── fault_categories_loader.py

│ ├── fault_categories_qdrant_sync.py

│ ├── backfill_concise_rules.py

│ └── _legacy/

│

├── requirements.txt

├── .gitignore

└── README.md

15. Maintenance & Operations

Adding New Diagnostic Procedures

To add a new procedure:

Edit backend/config/diagnostic_procedures.yaml and add a new entry under the appropriate fault type. Each procedure needs a fault type (one of the 8 categories), diagnostic objectives, relevant questions to consider asking, and optional equipment-specific fields (manufacturer, model, drive_type, control_system) for specificity scoring.
Run the knowledge loader: python backend/data_ingestion/knowledge_loader.py –clear
No code changes or server restart required.

Adding Admin Feedback

Open the chat interface and find an AI response that needs correction
Click the feedback button on the assistant’s message
Write the correction in clear, specific language (e.g., “When fault code F025 appears on Siemens S120 drives with overtemp alarm, always check the heat sink thermal paste first — it degrades after 5 years”)
Submit — the system automatically compresses it and stores it in DynamoDB and Qdrant
Future diagnoses for similar equipment/fault combinations will incorporate the correction

Updating the Equipment Catalog

python backend/data_ingestion/equipment_catalog_loader.py <file.xlsx> –source <source_name>

Importing New OEM Bulletin Data

Option A — Command Line (recommended for large files):

python backend/data_ingestion/oem_comms_loader.py <csv_file>

python backend/data_ingestion/service_bulletin_loader.py <tsv_file>

Option B — Admin API:

GET /admin/oem-bulletins/files to verify file is detected
POST /admin/oem-bulletins/import with the file path to start a background import
GET /admin/oem-bulletins/import/status to monitor progress
POST /admin/oem-bulletins/import/cancel if you need to stop the import

Monitoring

GET /api/health returns a JSON object with the status of every dependency. All backend modules use Python’s logging module. Global error handlers in error_handlers.py catch unhandled exceptions and return structured error responses rather than stack traces.

Scaling Considerations

Frontend: Completely stateless — deploy behind a CDN or load balancer. Backend: Also stateless — all session data lives in DynamoDB. Qdrant: Managed cloud instance that scales independently. PostgreSQL: Standard database scaling strategies apply — read replicas for read-heavy workloads, connection pooling (e.g., PgBouncer). DynamoDB: AWS-managed with automatic scaling. SageMaker: Endpoint scaling configured in the AWS console — increase instance count for higher concurrent throughput from multiple plant locations.

Fault Category System

Categories are organized in a strict 2-level hierarchy: Parent Category / Subcategory (e.g., “Rotating Equipment / Bearing Failures”). There are 112 categories covering all common industrial fault types. Categories are selected through semantic vector search, not hardcoded rules. To add new categories: update the fault categories loader data, run the loader to insert into PostgreSQL, then run the sync script to update Qdrant.

Stop diagnosing by gut. Start diagnosing by data. Get Your AI Readiness Assessment →

16. Appendix: Key Design Decisions

Decision	Rationale
Two-LLM system instead of a single prompt	Routing decisions need low temperature (0.3) for consistency, while diagnostic text needs higher temperature (0.7) for natural language. Using separate calls with different parameters improves both routing reliability and response quality — critical in industrial settings where a misrouted diagnosis could waste costly maintenance time.
6-stage parallel RAG search	Different types of documents (OEM manuals, service bulletins, parts data, categories) serve different purposes and require different filters. Running them in parallel via ThreadPoolExecutor keeps total latency close to the slowest single search — important for technicians troubleshooting active equipment faults.
YAML-based prompt management	Prompt templates change frequently during tuning. Storing them in YAML files (not Python code) allows domain experts and senior maintenance engineers to adjust prompts without understanding the codebase, and all changes are version-controlled through Git.
System-level language only (no specific component names)	Remote diagnosis cannot physically verify which specific component has failed. Naming components creates liability, could cause incorrect part ordering, and may bypass proper diagnostic procedure. Identifying the system area (e.g., “spindle drive system”) directs technicians to the right subsystem while leaving exact component identification to hands-on inspection with proper test equipment.
Maximum 2 clarifying questions	In manufacturing environments, every minute of unplanned downtime is costly. User testing with technicians showed significant frustration after more than 2 rounds of questions. Forcing a diagnosis with available information (even at LOW confidence) is more useful than continued questioning during an active production stoppage.
Admin feedback via RAG retrieval (not fine-tuning)	Fine-tuning requires GPU infrastructure, dataset preparation, and hours of training time. RAG-based feedback takes effect immediately, is fully auditable, and can be rolled back by deleting a single record. This is especially valuable for incorporating OEM-specific tribal knowledge from experienced field engineers.
DynamoDB for chat sessions	Chat sessions are key-value data with high read/write frequency and no relational needs. DynamoDB provides single-digit millisecond latency and auto-scales without capacity planning.
PostgreSQL for structured data	Equipment assets, service teams, and work orders have relational integrity requirements (foreign keys, complex queries). PostgreSQL provides ACID transactions and SQL for these structured data needs.
Qdrant for vector search	Purpose-built vector databases outperform PostgreSQL’s pgvector extension at this scale of data and query complexity. Qdrant’s native filtering, multiple collections, and cloud-managed infrastructure simplify operations.
Next.js API route proxies	Proxying through Next.js keeps the backend URL out of the browser, eliminates CORS issues, and enables the frontend and backend to be deployed on separate infrastructure — allowing the frontend to be on a plant intranet while the backend runs in a private cloud.
Independent frontend/backend	Separate codebases allow OT and IT teams to work independently, different scaling strategies, and different hosting platforms suitable for industrial network architectures.
FastEmbed for local embeddings	Generating embeddings locally eliminates API call latency and costs. The BAAI/bge-small-en-v1.5 model is small enough to run on any server while providing sufficient quality for industrial maintenance domain text. Avoids dependence on third-party embedding API availability during active production faults.
Concise rule compression for feedback	Raw feedback from senior engineers can be verbose and contextual. Compressing it into 25-word maintenance shorthand rules reduces prompt consumption and improves the LLM’s ability to follow the instruction concisely.
Score thresholds per RAG stage	Different collections have different data characteristics. OEM service bulletins with exact equipment matches deserve higher confidence (0.35 threshold), while symptom searches cast a wider net (0.3 threshold) to avoid missing relevant information about uncommon fault modes.
Acknowledgment detection via database	Storing acknowledgment patterns in a database table (not code) allows adding new patterns (e.g., maintenance shorthand, abbreviations like “ack” or “10-4”) without code deployments.

Frequently Asked Questions (FAQ)

1. What is the Industrial Equipment Diagnostics RAG system?

An AI-powered chatbot for manufacturing technicians that diagnoses industrial equipment faults before a specialist is dispatched. It uses Retrieval-Augmented Generation (RAG) to search 600+ OEM technical manuals, 12,000+ service bulletins, and OSHA/NFPA safety standards, then returns exactly three ranked diagnostic hypotheses with verified source citations. It runs on AWS SageMaker (Llama 3.1 8B), Qdrant Cloud, PostgreSQL, and a Next.js frontend.

2. Who is this system designed for?

Manufacturing technicians, maintenance personnel, and plant operators who need to assess what might be wrong with industrial machinery — CNC machines, servo drives, hydraulic systems, PLCs, compressors, and conveyors — before halting a production line or calling a specialist.

3. What happens step by step when a technician uses the system?

1. The technician selects their equipment (manufacturer, type, model) and enters a facility code.

2. They describe the fault in plain language.

3. The system asks up to 2 clarifying questions if needed.

4. A 6-stage parallel RAG search runs across OEM manuals, service bulletins, parts data, and fault categories.

5. The system returns exactly 3 ranked hypotheses with source citations and recommends qualified service teams on a map.

4. What is the 6-stage RAG search and why run stages in parallel?

Six Qdrant collections are searched simultaneously using Python’s ThreadPoolExecutor: OEM manuals (equipment-specific), OEM service bulletins (model + year filtered), symptom-specific docs, subsystem-specific docs, parts encyclopedia, and fault categories. Running them in parallel keeps total latency close to the slowest single search — important when technicians are troubleshooting active equipment faults and every second counts.

5. Why use DynamoDB for chat sessions and PostgreSQL for equipment data?

Chat sessions are key-value data with high read/write frequency and no relational requirements — DynamoDB delivers single-digit millisecond latency and auto-scales without capacity planning. Equipment assets, service teams, and work orders have relational integrity requirements (foreign keys, complex joins) that require PostgreSQL’s ACID transactions and SQL query capabilities.

6. Why does the system use a local embedding model instead of an API?

The BAAI/bge-small-en-v1.5 model runs locally via FastEmbed, generating 384-dimension embeddings on the backend server with no external API calls. This eliminates embedding latency, removes per-call cost, and — critically — avoids any dependency on third-party API availability during active production faults when external services may be unreachable from a plant network.

7. Why does the frontend proxy all requests through Next.js API routes?

Proxying through Next.js keeps the Python backend URL out of the browser, eliminates CORS configuration, and enables the frontend and backend to be deployed on completely separate infrastructure — for example, the frontend on a plant intranet and the backend in a private cloud. Only the BACKEND_URL environment variable needs to change at deployment time.

8. How does the admin feedback system improve diagnostic accuracy over time?

Senior engineers submit corrections via the chat interface. The LLM compresses each correction into a concise rule (max 25 words) in maintenance shorthand. The rule is stored as a vector in Qdrant’s admin_feedback collection. During future diagnoses, the top 3 most relevant rules are retrieved and injected into the system prompt as internal guidance — taking effect on the very next conversation, with no model retraining required.

9. Why use RAG-based feedback instead of fine-tuning the model?

Fine-tuning requires GPU infrastructure, dataset preparation, and hours of training time. RAG-based corrections take effect immediately (next conversation), are fully auditable with a complete DynamoDB record, and can be reversed by deleting a single Qdrant entry — without touching the base model. This is especially valuable for incorporating OEM-specific knowledge from experienced field engineers.

10. How do I add a new diagnostic procedure?

Edit backend/config/diagnostic_procedures.yaml and add a new entry under the appropriate fault type (one of 8 categories: vibration, noise, thermal, fluid_leak, electrical_fault, no_start, performance_degradation, maintenance_request). Then run: python backend/data_ingestion/knowledge_loader.py –clear. No code changes or server restart required.

11. What happens when the RAG search returns few or no relevant documents?

The system does not fabricate information. It adjusts the diagnosis confidence level to LOW, uses more cautious language, and relies on the LLM’s general industrial maintenance knowledge — clearly indicating that specific documentation was not available. Collections that don’t yet exist in Qdrant return empty results gracefully rather than crashing the pipeline.

Stage 1:

Foundational Intelligence

Stage 2:

Operational Intelligence

Stage 3:

Human-Led Autonomous Operations

Leadership

Contact Us

Access Industrial

90% faster reporting Engineering proposals automated

LivingLies

5M+ content items indexed AI knowledge agent deployed

Kaizenify

2X client growth SaaS scalability achieved

Family Office Access

100% system uptime Post-migration stability restored

Yacht Network

80+hours saved / month Manual work eliminated

Bottoms Up Beer

85% reduction in manual data entry ERP middleware automation

LovingIs

3X higher perceived response quality Emotion-aware AI governance layer

Workflow AI Agents

Stage 1:

Foundational Intelligence

Stage 2:

Operational Intelligence

Stage 3:

Human-Led Autonomous Operations

Leadership

Contact Us

Access Industrial

90% faster reporting Engineering proposals automated

LivingLies

5M+ content items indexed AI knowledge agent deployed

Kaizenify

2X client growth SaaS scalability achieved

Family Office Access

100% system uptime Post-migration stability restored

Yacht Network

80+hours saved / month Manual work eliminated

Bottoms Up Beer

85% reduction in manual data entry ERP middleware automation

LovingIs

3X higher perceived response quality Emotion-aware AI governance layer

Industrial Equipment Diagnostics RAG

1. Product Overview

What RAG Does

Key Design Principles

System-Level Language Only

Non-Deterministic Language

Exactly 3 Diagnostic Hypotheses

Maximum 2 Clarifying Questions

AI-Driven Category Selection

Graceful Handling of Limited Data

User Flow Diagram

Table of Contents

Stop diagnosing by gut. Start diagnosing by data. Get Your AI Readiness Assessment →

2. System Architecture

High-Level Architecture

Technology Stack

How the Frontend and Backend Communicate

This proxy pattern serves three purposes:

Development Mode

3. AI Diagnostic Engine

Two-LLM Architecture

Decision LLM (Routing)

The Decision LLM follows a mandatory 3-step reasoning process:

Diagnosis LLM (Response Generation)

Diagnosis Output Format

How the Response Parser Works

Valid Source Citations

6-Stage RAG Search Architecture

The DynamicFaultClassifier

Response Path Summary

Streaming Response

Service Team Scoring System

Stop diagnosing by gut. Start diagnosing by data. Get Your AI Readiness Assessment →

4. Data Pipeline & Knowledge Base

Data Sources Overview