Skip to content

GitHub

The GitHub Datasource allows you to connect a GitHub repository as a knowledge source for your Peer. When synced, Cognipeer crawls all files in the repository (up to the configured branch or tag) and indexes their content so your Peer can answer questions about your codebase, documentation, or any text content hosted in GitHub.


Use Cases

  • Answer questions about your codebase: "What does the AuthService class do?"
  • Surface documentation stored as Markdown files in a repo.
  • Power internal developer portals or engineering chatbots.
  • Search across multiple repositories by creating one datasource per repo.

Prerequisites

  • A GitHub Personal Access Token with at least repo (read) scope.
  • The owner (username or organization name) and repository name.
  • The branch or tag you want to index (default: main).

Setting Up a GitHub Datasource

  1. Navigate to Datasources in the sidebar, then click Add Datasource.
  2. Choose GitHub as the datasource type.
  3. Fill in the required fields:
    • Owner: GitHub username or organization (e.g., my-org).
    • Repository: Repository name (e.g., my-repo).
    • Access Token: Your GitHub Personal Access Token.
    • Reference (optional): Branch or tag name to index (defaults to main).
  4. Click Save to create the datasource.
  5. Click Sync to start the initial indexing.

What Gets Indexed

Cognipeer fetches all files from the repository tree recursively (up to 5,000 files). Files are processed and split into searchable chunks. Common file types indexed include:

  • Source code: .js, .ts, .py, .go, .java, .rb, .c, .cpp, and more.
  • Documentation: .md, .mdx, .txt, .rst.
  • Configuration: .yaml, .yml, .json, .toml.

Binary files (images, compiled assets) are automatically skipped.


Keeping the Datasource Up to Date

GitHub datasources do not automatically detect repository changes. To pull the latest content:

  • Manually click Sync in the datasource settings after pushing updates.
  • For automated sync workflows from CI/CD, use the Developer Hub.

Best Practices

  • Scope the repository: For very large monorepos, consider creating one datasource per sub-directory or package by organizing your content into separate repos.
  • Use clear file names: Meaningful file and folder names improve search relevance.
  • Add a README: A well-written README.md gives the Peer a concise overview of the project.
  • Refresh after major commits: Sync the datasource after significant changes to keep answers accurate.

Limitations

  • Maximum 5,000 files per sync. Repositories exceeding this limit will be partially indexed.
  • Only text-based files are indexed. Binary or encoded files are not processed.
  • Private repositories require a valid access token that has not expired.

Studio · Pulse — Cognipeer product documentation