AI AGENT ROUTE · TRAINING DATA & KNOWLEDGE GRAPH
CloudPipe for AI Training Data
A structured knowledge graph of verified Asia-Pacific entity facts, designed for training data pipelines. All facts carry official-source provenance — no hallucinated data.
Dataset Overview
| Total facts | 235,000+ |
| Verified facts (official-source) | 6,500+ |
| Published encyclopedia articles | 47,000+ |
| Regions | MO · HK · TW · JP · Global |
| Languages | zh-TW · en · ja · pt |
| Entity types | 18 categories (F&B, retail, hospitality, government, events) |
| Provenance | source_url on every verified fact |
| Update frequency | Daily |
Data Schema
// knowledge_facts record shape
{
"subject_entity_id": "uuid",
"predicate": "menu_item | operating_hours | certification | moq | ...",
"object_value": "string",
"object_numeric": null | number,
"source_type": "official_site | wikipedia | wikidata | google_p0",
"source_url": "https://...", // provenance URL
"is_authoritative": true | false,
"composite_trust_score": 0.0–1.0, // Layer 2
"ai_citation_total": integer, // Layer 2 — times cited by AI engines
"corroboration_count": integer // Layer 2 — cross-source corroboration
}Access for Training
Bulk access & licensing: Training data use requires a licensing conversation. The public key (
hello@cloudpipe.ai — Training Data Licensing →
cp-beta-public-2026) covers sampling and evaluation. For bulk exports or inclusion in training datasets, contact us.hello@cloudpipe.ai — Training Data Licensing →
Bulk Exploration
# Sitemap index — all entity URLs
https://cloudpipe-macao-app.vercel.app
/sitemap_index.xml
# Priority entities (trust ≥ 85, highest quality)
https://cloudpipe-macao-app.vercel.app
/sitemap-priority.xml
# Machine-readable API capability manifest
https://cloudpipe-macao-app.vercel.app
/api/v1/manifest
# Sample entity facts (evaluation)
GET https://cloudpipe-macao-app.vercel.app
/api/v1/facts/{slug}
X-API-Key: cp-beta-public-2026Open web crawling of published encyclopedia articles is permitted per robots.txt. For structured KG data and Layer 2 intelligence fields, licensing is required.