BioAgents Architecture

BioAgents is an AI powered system that processes research data, extracts metadata, identifies entities, and generates knowledge graphs to enhance data discoverability and usability.

System Overview

BioAgents provides intelligent processing capabilities for research data within the Bio-DID-Seq ecosystem. It uses large language models (LLMs) and specialized AI agents to extract valuable information from research papers, identify entities, generate knowledge graphs, and enable natural language querying of research data.

Core Components

1. Agent Framework

BioAgents is built on the Eliza OS agent framework which provides:

Agent Runtime: Core execution environment for AI agents
Plugin System: Extensible architecture for specialized capabilities
Memory Management: Persistent storage of agent knowledge and states
Task Workers: Async processing of compute intensive tasks

2. Knowledge Processing Pipeline

The BioAgents system processes research data through a multi stage pipeline:

Document Processing

PDF Extraction: Extracts text and structure from PDFs using GROBID
Content Normalization: Converts different formats to structured text
Section Identification: Recognizes abstract, methods, results, etc.

Knowledge Extraction

Entity Recognition: Identifies genes, proteins, diseases, compounds, etc.
Relationship Detection: Finds connections between biological entities
Claim Extraction: Identifies key scientific claims and evidence
Metadata Collection: Authors, citations, publication details, etc.

Knowledge Representation

RDF Generation: Converts extracted information to Resource Description Framework
JSON-LD Creation: Structured, interoperable representation of knowledge
Knowledge Graph Integration: Links new knowledge with existing data

3. Key System Components

BioAgents Service

The core service that provides biological intelligence capabilities:

export class HypothesisService extends Service {
  static serviceType = "hypothesis";
  capabilityDescription = "Generate and judge hypotheses";
  // Service implementation...
}

DKG Plugin

Integrates with decentralized knowledge graphs:

export const dkgPlugin: Plugin = {
  name: "dkg",
  description: "Agent DKG which allows you to store memories on the OriginTrail Decentralized Knowledge Graph",
  actions: [dkgInsert],
  providers: [],
  evaluators: [],
  services: [HypothesisService],
  routes: [health, gdriveWebhook, gdriveManualSync, totalAgents],
};

Knowledge Actions

Functions that process and store biological knowledge:

export const dkgInsert: Action = {
  name: "INSERT_MEMORY_ACTION",
  description: "Create a memory on the OriginTrail Decentralized Knowledge Graph",
  // Implementation...
};

Anthropic Integration

Leverages Claude for advanced biological reasoning:

async function updateGoTerms(data, client: Anthropic) {
  for (const entry of data) {
    const subjectResult = await searchGo(entry.subject, client);
    entry.subject = { term: entry.subject, id: subjectResult };
    // Processing logic...
  }
  return data;
}

Integration with Bio-DID-Seq

BioAgents integrates with the Bio-DID-Seq system through several key pathways:

1. API Integration

API Integration

2. DID based metadata management

Bio-DID-Seq respects and enforces the DID based decentralised metadata management model:

DID based metadata management

3. Data Processing Flow

Data Processing Flow

API Specifications

BioAgents API Endpoints

Endpoint	Method	Description
`/api/bioagents/process`	POST	Process a research paper
`/api/bioagents/status`	POST	Check processing status
`/api/bioagents/metadata`	POST	Get extracted metadata
`/api/bioagents/search`	POST	Search biological entities
`/api/bioagents/knowledge-graph`	POST	Generate knowledge graph
`/api/bioagents/query`	POST	Query using natural language
`/api/bioagents/knowledge`	POST	Add knowledge to the system

Request/Response Examples

Process Paper Request

{
  "file_cid": "QmXg9Pp2ytZ14xgK35M6iTC2Vz6jR9zYgooNp2UHPTMnPN",
  "title": "CRISPR-Cas9 Gene Editing for Neurodegenerative Diseases",
  "authors": ["Jane Smith", "John Doe"],
  "doi": "10.1038/s41586-021-03819-2"
}

Process Paper Response

{
  "task_id": "b8e5c9a4-2c7a-4d64-b4b3-7c01e2f8b54e",
  "status": "processing"
}

Security and Privacy

BioAgents maintains the same security and privacy standards as the Bio-DID-Seq system:

Authentication: All API calls require valid authentication
Authorization: UCAN based capability model for granular permissions
Data Encryption: All data in transit and at rest is encrypted
Audit Logging: All data access is logged for compliance
Privacy Controls: Data processors follow GDPR principles

Future Development

Planned enhancements for the BioAgents system include:

Multi-Modal Processing: Support for images, figures, and tables extraction
Specialized Biological Models: Domain specific AI models for different biological fields
Real time Knowledge Updates: Continuous learning from new research
Federated Learning: Privacy preserving distributed model training
Hypothesis Generation: AI assisted hypothesis formulation based on existing knowledge