Vertical AI Platform Comparison
Evaluate how our AI tooling stacks up against alternatives on deployment speed, domain specificity, accuracy, and total cost of ownership.
Capability Matrix
Side-by-side feature and performance comparison across key evaluation criteria for vertical AI implementations.
| Capability | Platform | GenericAI | VerticalBot | DomainLLM |
|---|---|---|---|---|
| Domain-specific fine-tuning | Included | Add-on | Add-on | Enterprise only |
| Data residency controls | All regions | US only | US/EU | Enterprise tier |
| Audit trail exports SOC 2 Type II availability | Native | Limited | No | Add-on |
| API-first architecture | Yes | Yes | Partial | Yes |
| Custom vocabulary support No per-term billing | Unlimited | Tiered | Limited | Enterprise only |
| On-premise deployment Docker/Kubernetes options | Available | No | No | Yes |
| Latency at p95 | < 200ms | < 300ms | < 500ms | < 250ms |
| Model version pinning | Full control | Provider-managed | No | Limited |
| Multi-tenant isolation | Hard tenant boundaries | Shared infra | Logical only | Hard isolation |
| SLA uptime guarantee | 99.9% | 99.5% | 99.0% | 99.9% |
| Self-serve onboarding | Yes | Sales required | Yes | Professional services |
| Compliance certifications | SOC 2, HIPAA, GDPR | SOC 2 | GDPR | SOC 2, HIPAA, GDPR |
| Embedding model options | 3rd party + custom | Fixed | Fixed | Custom only |
| Webhook / event streaming | Standard | Enterprise tier | No | Add-on |
Technical Differentiation
Implementation details that affect reliability, compliance posture, and operational overhead for production deployments.
Architecture
Retrieval pipeline
Vector search with hybrid BM25 fallback. Re-ranking layer applies cross-encoder for precision. Retrieval latency measured at p99 < 80ms for index sizes up to 50M documents.
Inference configuration
Configurable temperature, top-p, and top-k per request. Streaming and non-streaming modes share the same endpoint. Token counting happens server-side to prevent over-generation.
Failover behavior
Automatic regional failover with < 30s DNS TTL. In-flight requests return 503 and are safe to retry. No message loss on upstream failures due to queue-backed processing.
State management
Stateless inference calls allow horizontal scaling without session affinity. Conversation context stored client-side or in managed Redis with customer-controlled TTL.
Compliance
Data handling
Customer data is not used for model training by default. Data residency enforced at the account level. PII detection and redaction available as a pre-processing layer.
Access controls
Role-based access control with SCIM provisioning support. API keys are scoped to permission sets. Audit logs capture all write operations with user attribution.
Incident response
Runbook documentation provided for common failure modes. Breach notification SLA aligns with GDPR Article 33 (72 hours). Penetration testing conducted quarterly by third-party firm.
Common Questions
How does on-premise deployment work?
We provide Helm charts and Terraform modules for Kubernetes-based deployments. The control plane runs in your infrastructure; the model weights are deployed as container images. Minimum requirements include 2x NVIDIA A10G or equivalent, 32GB RAM per inference pod, and Kubernetes 1.26+. We offer an initial setup call and documentation for self-service, with professional services available for air-gapped environments.
Downtime during updates depends on your high-availability configuration. Single-node clusters require ~5 minutes of planned maintenance; multi-node clusters with rolling updates maintain availability. We recommend at least 3 replicas for production workloads.
What SLA credits apply if uptime misses the guarantee?
Service credits are applied monthly: 99.0–99.5% uptime yields 5% credit on that month's invoice; 98.0–99.0% yields 15%; below 98.0% yields 25%. Credits are applied to future invoices, not refunded. SLA is measured using our status page data, excluding scheduled maintenance announced 48+ hours in advance. Claiming credits requires opening a ticket within 30 days of the incident.
We maintain a public incident history with root cause analyses for events exceeding 15 minutes. Historical uptime data is available for the trailing 12 months on the status page.
Can we use our own embedding models?
Yes. We support Bring Your Own Embedding (BYOE) through an OpenAI-compatible embedding endpoint. Supported input dimensions include 384, 768, 1024, 1536, and 3072. Custom embedding models must implement the standard cosine similarity interface.
We also offer fine-tuned embedding models pre-trained on domain-specific corpora as an add-on. These are hosted on our infrastructure and billed per 1M tokens. Switching embedding models in production is a zero-downtime operation via the model alias feature.
How is HIPAA compliance handled for healthcare deployments?
Healthcare deployments require a Business Associate Agreement (BAA) before processing any PHI. We sign BAAs on the Starter plan and above. The infrastructure includes encryption at rest (AES-256) and in transit (TLS 1.3), with customer-managed keys available on Enterprise.
PHI cannot be logged or stored outside of designated regions. The audit trail feature is required for HIPAA workloads and captures inference inputs/outputs with timestamps. We undergo annual HIPAA audits by a third-party assessors; current audit reports are available under NDA.
What happens to data when we cancel?
Upon cancellation, we retain data for 30 days in read-only mode for export. During this window, you can request a full data export via API or dashboard. After 30 days, customer data is deleted from production systems within 60 days.
Backups are purged on a rolling 30-day cycle. Deletion from backups follows the same schedule. We provide a certificate of destruction upon request. Data in derivative analytics (aggregate, de-identified performance metrics) is retained separately per our privacy policy, with opt-out available.
How does pricing scale with usage?
Pricing is consumption-based on tokens processed (input + output). The first 1M tokens/month are included on Starter. Volume discounts apply at 10M, 100M, and 1B token thresholds. Embedding generation is billed separately at $0.10/1K tokens for standard models.
There are no seat-based fees, no minimum monthly commitments on Starter/Pro, and no setup fees. Enterprise plans include committed spend tiers with fixed per-token rates. Overages on committed plans are billed at the same rate up to 20% overage, then at standard rates.