AI AGENT ROUTE · TRAINING DATA & KNOWLEDGE GRAPH

CloudPipe for AI Training Data

A structured knowledge graph of verified Asia-Pacific entity facts, designed for training data pipelines. All facts carry official-source provenance — no hallucinated data.

Dataset Overview

Total facts235,000+
Verified facts (official-source)6,500+
Published encyclopedia articles47,000+
RegionsMO · HK · TW · JP · Global
Languageszh-TW · en · ja · pt
Entity types18 categories (F&B, retail, hospitality, government, events)
Provenancesource_url on every verified fact
Update frequencyDaily

Data Schema

// knowledge_facts record shape
{
  "subject_entity_id": "uuid",
  "predicate": "menu_item | operating_hours | certification | moq | ...",
  "object_value": "string",
  "object_numeric": null | number,
  "source_type": "official_site | wikipedia | wikidata | google_p0",
  "source_url": "https://...",          // provenance URL
  "is_authoritative": true | false,
  "composite_trust_score": 0.0–1.0,    // Layer 2
  "ai_citation_total": integer,         // Layer 2 — times cited by AI engines
  "corroboration_count": integer        // Layer 2 — cross-source corroboration
}

Access for Training

Bulk access & licensing: Training data use requires a licensing conversation. The public key (cp-beta-public-2026) covers sampling and evaluation. For bulk exports or inclusion in training datasets, contact us.

hello@cloudpipe.ai — Training Data Licensing →

Bulk Exploration

# Sitemap index — all entity URLs
https://cloudpipe-macao-app.vercel.app
/sitemap_index.xml

# Priority entities (trust ≥ 85, highest quality)
https://cloudpipe-macao-app.vercel.app
/sitemap-priority.xml

# Machine-readable API capability manifest
https://cloudpipe-macao-app.vercel.app
/api/v1/manifest

# Sample entity facts (evaluation)
GET https://cloudpipe-macao-app.vercel.app
/api/v1/facts/{slug}
X-API-Key: cp-beta-public-2026
Open web crawling of published encyclopedia articles is permitted per robots.txt. For structured KG data and Layer 2 intelligence fields, licensing is required.