Vertical AI Platform Comparison

Evaluate how our AI tooling stacks up against alternatives on deployment speed, domain specificity, accuracy, and total cost of ownership.

Capability Matrix

Side-by-side feature and performance comparison across key evaluation criteria for vertical AI implementations.

Capability	Platform	GenericAI	VerticalBot	DomainLLM
Domain-specific fine-tuning	Included	Add-on	Add-on	Enterprise only
Data residency controls	All regions	US only	US/EU	Enterprise tier
Audit trail exports SOC 2 Type II availability	Native	Limited	No	Add-on
API-first architecture	Yes	Yes	Partial	Yes
Custom vocabulary support No per-term billing	Unlimited	Tiered	Limited	Enterprise only
On-premise deployment Docker/Kubernetes options	Available	No	No	Yes
Latency at p95	< 200ms	< 300ms	< 500ms	< 250ms
Model version pinning	Full control	Provider-managed	No	Limited
Multi-tenant isolation	Hard tenant boundaries	Shared infra	Logical only	Hard isolation
SLA uptime guarantee	99.9%	99.5%	99.0%	99.9%
Self-serve onboarding	Yes	Sales required	Yes	Professional services
Compliance certifications	SOC 2, HIPAA, GDPR	SOC 2	GDPR	SOC 2, HIPAA, GDPR
Embedding model options	3rd party + custom	Fixed	Fixed	Custom only
Webhook / event streaming	Standard	Enterprise tier	No	Add-on

Technical Differentiation

Implementation details that affect reliability, compliance posture, and operational overhead for production deployments.

Architecture

Retrieval pipeline

Vector search with hybrid BM25 fallback. Re-ranking layer applies cross-encoder for precision. Retrieval latency measured at p99 < 80ms for index sizes up to 50M documents.

Inference configuration

Configurable temperature, top-p, and top-k per request. Streaming and non-streaming modes share the same endpoint. Token counting happens server-side to prevent over-generation.

Failover behavior

Automatic regional failover with < 30s DNS TTL. In-flight requests return 503 and are safe to retry. No message loss on upstream failures due to queue-backed processing.

State management

Stateless inference calls allow horizontal scaling without session affinity. Conversation context stored client-side or in managed Redis with customer-controlled TTL.

Compliance

Data handling

Customer data is not used for model training by default. Data residency enforced at the account level. PII detection and redaction available as a pre-processing layer.

Access controls

Role-based access control with SCIM provisioning support. API keys are scoped to permission sets. Audit logs capture all write operations with user attribution.

Incident response

Runbook documentation provided for common failure modes. Breach notification SLA aligns with GDPR Article 33 (72 hours). Penetration testing conducted quarterly by third-party firm.

Common Questions

How does on-premise deployment work?

We provide Helm charts and Terraform modules for Kubernetes-based deployments. The control plane runs in your infrastructure; the model weights are deployed as container images. Minimum requirements include 2x NVIDIA A10G or equivalent, 32GB RAM per inference pod, and Kubernetes 1.26+. We offer an initial setup call and documentation for self-service, with professional services available for air-gapped environments.

Downtime during updates depends on your high-availability configuration. Single-node clusters require ~5 minutes of planned maintenance; multi-node clusters with rolling updates maintain availability. We recommend at least 3 replicas for production workloads.

What SLA credits apply if uptime misses the guarantee?

Service credits are applied monthly: 99.0–99.5% uptime yields 5% credit on that month's invoice; 98.0–99.0% yields 15%; below 98.0% yields 25%. Credits are applied to future invoices, not refunded. SLA is measured using our status page data, excluding scheduled maintenance announced 48+ hours in advance. Claiming credits requires opening a ticket within 30 days of the incident.

We maintain a public incident history with root cause analyses for events exceeding 15 minutes. Historical uptime data is available for the trailing 12 months on the status page.

Can we use our own embedding models?

Yes. We support Bring Your Own Embedding (BYOE) through an OpenAI-compatible embedding endpoint. Supported input dimensions include 384, 768, 1024, 1536, and 3072. Custom embedding models must implement the standard cosine similarity interface.

We also offer fine-tuned embedding models pre-trained on domain-specific corpora as an add-on. These are hosted on our infrastructure and billed per 1M tokens. Switching embedding models in production is a zero-downtime operation via the model alias feature.

How is HIPAA compliance handled for healthcare deployments?

Healthcare deployments require a Business Associate Agreement (BAA) before processing any PHI. We sign BAAs on the Starter plan and above. The infrastructure includes encryption at rest (AES-256) and in transit (TLS 1.3), with customer-managed keys available on Enterprise.

PHI cannot be logged or stored outside of designated regions. The audit trail feature is required for HIPAA workloads and captures inference inputs/outputs with timestamps. We undergo annual HIPAA audits by a third-party assessors; current audit reports are available under NDA.

What happens to data when we cancel?

Upon cancellation, we retain data for 30 days in read-only mode for export. During this window, you can request a full data export via API or dashboard. After 30 days, customer data is deleted from production systems within 60 days.

Backups are purged on a rolling 30-day cycle. Deletion from backups follows the same schedule. We provide a certificate of destruction upon request. Data in derivative analytics (aggregate, de-identified performance metrics) is retained separately per our privacy policy, with opt-out available.

How does pricing scale with usage?

Pricing is consumption-based on tokens processed (input + output). The first 1M tokens/month are included on Starter. Volume discounts apply at 10M, 100M, and 1B token thresholds. Embedding generation is billed separately at $0.10/1K tokens for standard models.

There are no seat-based fees, no minimum monthly commitments on Starter/Pro, and no setup fees. Enterprise plans include committed spend tiers with fixed per-token rates. Overages on committed plans are billed at the same rate up to 20% overage, then at standard rates.

Request access to evaluate against your current stack

Start evaluation