Blog

Insights on AI, development, and modern technologies

header

Hi there!

I've been meaning to write this article for more than a month now, but there hasn't been time or the right mood. Still, I gathered my thoughts and finally cobbled together something. I hope you find it useful. Take a look at GitHub — there are ready-to-run working examples that you can launch. For a better understanding of what's happening, get acquainted with the structure and code of the MCP server.

Today we connect Cursor and VS Code to your APIs via MCP.

Sometimes you look at your own useful scripts and think: 'How can you neatly connect them to AI without unnecessary hassle? What do you need for this? From which side should you approach?' Today, in this article, we will try to solve this problem and learn how to create your own MCPs. And you can also check my previous AI articles.


MCP in a nutshell

MCP (Model Context Protocol) is a unified 'translator' between your tools and intelligent clients (chat assistants; for example, agents in Cursor). Instead of another fragmented API, you precisely describe what your tool can do and what its inputs/outputs are — then everything works according to the standard.

Why is MCP convenient?

The main idea of MCP is that your API can be connected to the model as a separate agent with transparent rules:

  • Standardization. A single language for tools and clients instead of a set of different protocols.
  • Governance. Tools connected to the model have explicit schemas, predictable behavior, clear rights.
  • Fast integration. Connect API/FS/DB/DevOps processes — and use them from the IDE or the chat.

Where MCP is particularly relevant

MCP is suitable for a range of tasks, most of which relate to development (now primarily editors adapted to work with external tools via MCP). For example:

  • DevTools and ChatOps: CI/CD commands, diagnostics, and access to logs.
  • Data/BI: aggregated queries, insightful summaries.
  • Internal APIs: a single control point for the team.
  • RAG/automation: data collection and pre- and post-processing.
  • Working with documentation (for example, Confluence) and others. List of proposed MCPs for VS Code: GitHub

How to build a simple MCP server with HTTP transport (Bun/Node)

Back to building our server. I have already prepared several examples of tools in the training repository; you can explore them at the link lesson8_mcp/mcp_server in the bel_geek repository. Code compatible with Bun and Node.js.

What we'll build

We'll create a simple server without unnecessary bells and whistles. This will be a local HTTP server with /healthz and /mcp, stateless, with three demo tools (the same set as in the repository) to immediately test MCP:

  • Routes:
    • GET /healthz - health check.
    • /mcp - MCP endpoint (GET, POST, DELETE).
  • Stateless mode (no sessions).
  • Three tools:
    • echo - returns the transmitted text. - get_proverb_by_topic — proverbs by topic (topic, random, limit)
    • get_weather — local weather from wttr.in.


🚀 Setting up the server and connecting it to Cursor/VS Code

The theory is done; time to act: clone, install, run — and MCP is up and running.

🔑 Prerequisites (what you need before starting)

No surprises here:

  • Node.js ≥ 18 or Bun ≥ 1.x (Bun starts faster — fewer unnecessary moves). - Two packages: @modelcontextprotocol/sdk (the MCP foundation) and zod (to describe input parameters precisely and reliably).
  • Docker — only if you want to package everything into a container right away. For local testing, it's not required.

⚡️ Running an example

No unnecessary philosophy—simply clone the repository and run:

text
1git clone https://github.com/bel-frontend/RAG
2cd RAG/lesson8_mcp/mcp_server
text
1bun install
2bun index.ts

If everything goes well, the console will display the message:

text
1MCP Streamable HTTP Server on http://localhost:3002/mcp
2Available endpoints: /healthz, /mcp

🎉 Server! It's alive! Let's check its heartbeat:

text
1curl -s http://localhost:3002/healthz

The response should be in the format { ok: true, timestamp: ... }.

🧩 Architecture in simple terms

How the server works:

  1. An MCP server is created — it registers tools (echo, get_proverb_by_topic, get_weather).
  2. An HTTP transport is added — MCP can receive and send requests via /mcp.
  3. Routes:
  • /healthz — returns a simple JSON object to verify that the server is alive.
  • /mcp - the main endpoint through which Cursor or VS Code connects to the tools.
  1. Context — the headers (apikey, applicationid) are stored in storage so that the tools can use them.
  2. Termination — on shutdown (SIGINT/SIGTERM) the server closes properly.

Key Implementation Details

The server architecture consists of three main components:

1. MCP Server & Transport

typescript
1const mcp = new McpServer({ name: 'test-mcp', version: '0.1.0' });
2const transport = new StreamableHTTPServerTransport({
3    sessionIdGenerator: undefined  // Stateless mode
4});
5mcp.connect(transport);

2. HTTP Routing

Two endpoints handle all traffic:

  • /healthz - health checks
  • /mcp - MCP protocol communication (POST for tool calls, GET/DELETE for sessions)

3. CORS & Error Handling

The server includes proper CORS headers and graceful shutdown on SIGINT/SIGTERM signals.

🛠 Tools — the main magic here

Three simple tools demonstrate MCP capabilities:

1. Echo Tool - Returns input text (useful for testing)

typescript
1mcp.registerTool('echo', {
2    title: 'Echo',
3    description: 'Return the same text',
4    inputSchema: { text: z.string() }
5}, async ({ text }) => ({
6    content: [{ type: 'text', text }]
7}));

2. Proverbs Tool - Fetches Belarusian proverbs with filtering

typescript
1mcp.registerTool('get_proverb_by_topic', {
2    inputSchema: {
3        topic: z.string().optional(),
4        random: z.boolean().optional(),
5        limit: z.number().int().positive().max(200).optional()
6    }
7}, async ({ topic, random, limit }) => {
8    const data = await fetch(PROVERBS_URL).then(r => r.json());
9    let items = data.map(d => d.message);
10    
11    // Filter by topic
12    if (topic) {
13        items = items.filter(m => m.toLowerCase().includes(topic.toLowerCase()));
14    }
15    
16    // Random selection with Fisher-Yates shuffle
17    if (random) {
18        // ... shuffle logic
19    }
20    
21    return { content: [{ type: 'text', text: items.join('\n') }] };
22});

3. Weather Tool - Shows current weather via wttr.in

typescript
1mcp.registerTool('get_weather', {
2    inputSchema: { city: z.string() }
3}, async ({ city }) => {
4    const weather = await fetch(`https://wttr.in/${city}?format=3`).then(r => r.text());
5    return { content: [{ type: 'text', text: weather }] };
6});

Key Points:

  • Use Zod schemas for input validation
  • Return format: { content: [{ type: 'text', text: string }] }
  • Handle errors gracefully with try-catch
  • Add timeout handling for external APIs

🖇 Connecting to Cursor and VS Code

And now the most interesting part — integration. When MCP is running, we can use it through Cursor or GitHub Copilot in VS Code.

Cursor:

  1. Start the server (bun index.ts).
  2. Create the file in the project at ./.cursor/mcp.json with the following configuration:
json
1{
2  "mcpServers": {
3    "test-mcp": {
4      "type": "http",
5      "url": "http://localhost:3002/mcp",
6      "headers": {
7        "apiKey": "API_KEY_1234567890",
8        "applicationId": "APPLICATION_ID"
9      }
10    }
11  }
12}

Open Settings → Model Context Protocol and make sure that test-mcp is in the list. In the Cursor chat, type (make sure that the agent is enabled): "Invoke the get_weather tool for Minsk" - and see the response.

VS Code (Copilot Chat)

Here it's almost the same, only the file needs to be placed into .vscode/mcp.json. After that, in the Copilot Chat toolbar your tool should appear.

json
1{
2    "servers": {
3        "test-mcp": {
4            "type": "http",
5            "url": "http://localhost:3002/mcp",
6            "headers": {
7                "apiKey": "API_KEY_1234567890",
8                "applicationId": "APPLICATION_ID"
9            }
10        }
11    }
12}

Pro tip: You can configure multiple MCP servers or use environment variables for credentials.

🐳 Docker Deployment - Production Ready

Quick Start:

Would you like to package it right away? Build and run it in Docker:

bash
1docker compose build --no-cache
2docker compose up -d

MCP will be available at http://localhost:3002/mcp.

Understanding the Docker Setup

The repository includes a complete Docker configuration:

Dockerfile Example:

dockerfile
1FROM oven/bun:1 as base
2WORKDIR /app
3
4COPY package.json bun.lockb ./
5RUN bun install --frozen-lockfile --production
6
7COPY . .
8
9EXPOSE 3002
10
11HEALTHCHECK --interval=30s --timeout=3s CMD \
12  curl -f http://localhost:3002/healthz || exit 1
13
14CMD ["bun", "index.ts"]

docker-compose.yml:

yaml
1version: '3.8'
2
3services:
4  mcp-server:
5    build: .
6    container_name: mcp-server
7    ports:
8      - "3002:3002"
9    environment:
10      - NODE_ENV=production
11    restart: unless-stopped

Useful Docker Commands:

bash
1# View logs
2docker compose logs -f mcp-server
3
4# Restart service
5docker compose restart
6
7# Stop and remove
8docker compose down
9
10# Rebuild and restart
11docker compose up -d --build

Team-friendly collaboration: everyone uses the same mental model and doesn't waste time on "what works for me—doesn't work for you."

🤔 Typical pitfalls and how to bypass them

  • CORS—when connecting from a browser, you must allow headers. We added a basic variant in the code.
  • Stateless/Stateful — in the example the server is stateless. If you need sessions, enable sessionIdGenerator.
  • API headers—in Node.js they arrive in lowercase (apikey), not apiKey. It’s easy to get confused. External services (proverbs and weather) can slow things down. Add timeouts and caching.

✍️ How to build your own tool

Here is a simple example:

js
1import { z } from 'zod';
2import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
3
4export function registerMyTools(mcp: McpServer) {
5  mcp.registerTool(
6    'my_tool',
7    {
8      title: 'my_tool',
9      description: 'A simple tool that adds 2+2',
10      inputSchema: { name: z.string() },
11    },
12    async ({ name }) => ({
13      content: [{ type: 'text', text: `Hello, ${name}!` }],
14    })
15  );
16}

Ready—now you can add your own 'tricks' to MCP and use them from the IDE.

✍️ Building Custom Tools

Here's a practical example of a database query tool:

typescript
1import { z } from 'zod';
2import { Pool } from 'pg';
3
4const pool = new Pool({ /* config */ });
5
6export function registerDatabaseTools(mcp: McpServer) {
7    mcp.registerTool('query_users', {
8        description: 'Search users by name or email',
9        inputSchema: {
10            searchTerm: z.string().min(2),
11            limit: z.number().int().max(100).default(10)
12        }
13    }, async ({ searchTerm, limit }) => {
14        const result = await pool.query(
15            'SELECT id, name, email FROM users WHERE name ILIKE $1 LIMIT $2',
16            [`%${searchTerm}%`, limit]
17        );
18        
19        return {
20            content: [{
21                type: 'text',
22                text: result.rows.map(r => `${r.name} <${r.email}>`).join('\n')
23            }]
24        };
25    });
26}

Best Practices:

  • Always validate inputs with Zod
  • Use parameterized queries to prevent SQL injection
  • Handle errors gracefully and return useful messages
  • Add timeouts for external API calls
  • Keep tool descriptions clear and concise

🎯 Results

We built an MCP server on Bun/Node, added three demo tools, connected it to Cursor and VS Code, ran it in Docker and discussed typical issues. The main thing is that MCP makes connecting your own tools to smart IDEs simple and standardized.

The future lies only in your imagination: you can integrate DevOps processes, databases, BI queries, and internal APIs. MCP makes it possible for other team members to simply pick it up and use it.

What's next

We built a simple MCP server with HTTP transport, added three practical tools, configured CORS, and demonstrated configurations for Cursor and GitHub Copilot (VS Code), as well as a Docker deployment. The next steps are to extend the toolset, implement authentication, and add logging and caching. If needed, stateful sessions and a production deployment. If this resonates with you, create your own tools and share them with the community! Will create smart and useful things.

Greetings! Today, I’m sharing a short guide on how to set up a project to work with GitHub Copilot.

Reliable AI workflow with GitHub Copilot: complete guide with examples (2025)

This guide shows how to build predictable and repeatable AI processes (workflows) in your repository and IDE/CLI using agentic primitives and context engineering. Here you will find the file structure, ready-made templates, security rules, and commands.

⚠️ Note: the functionality of prompt files and agent mode in IDE/CLI may change - adapt the guide to the specific versions of Copilot and VS Code you use.


1) Overview: what the workflow consists of

The main goal is to break the agent's work into transparent steps and make them controllable. For this there are the following tools:

  • Custom Instructions (.github/copilot-instructions.md) - global project rules (how to build, how to test, code style, PR policies).
  • Path-specific Instructions (.github/instructions/*.instructions.md) - domain rules targeted via applyTo (glob patterns).
  • Chat Modes (.github/chatmodes/*.chatmode.md) - specialized chat modes (for example, Plan/Frontend/DBA) with fixed tools and model.
  • Prompt Files (.github/prompts/*.prompt.md) - reusable scenarios/"programs" for typical tasks (reviews, refactoring, generation).
  • Context helpers (docs/*.spec.md, docs/*.context.md, docs/*.memory.md) - specifications, references, and project memory for precise context.
  • MCP servers (.vscode/mcp.json or via UI) - tools and external resources the agent can use.

2) Project file structure

The following structure corresponds to the tools described above and helps to compose a full workflow for agents.

text
1.github/
2  copilot-instructions.md
3  instructions/
4    backend.instructions.md
5    frontend.instructions.md
6    actions.instructions.md
7  prompts/
8    implement-from-spec.prompt.md
9    security-review.prompt.md
10    refactor-slice.prompt.md
11    test-gen.prompt.md
12  chatmodes/
13    plan.chatmode.md
14    frontend.chatmode.md
15.vscode/
16  mcp.json
17docs/
18  feature.spec.md
19  project.context.md
20  project.memory.md

3) Files and their purpose - technical explanation

Now let's review each tool separately and its role. Below is how it’s arranged under the hood: what these files are, why they exist, how they affect the agent's understanding of the task, and in what order they are merged/overridden. The code examples below match the specification.

File/folderWhat it isWhyWhere it applies
.github/copilot-instructions.mdGlobal project rulesConsistent standards for all responsesEntire repository
.github/instructions/*.instructions.mdTargeted instructions for specific pathsDifferent rules for frontend/backend/CIOnly for files matching the applyTo
.github/chatmodes/*.chatmode.mdA set of rules + allowed tools for a chat modeSeparate work phases (plan/refactor/DBA)When that chat mode is selected
.github/prompts/*.prompt.mdTask "scenarios" (workflow)Re-run typical processesWhen invoked via /name or CLI
docs/*.spec.mdSpecificationsPrecise problem statementsWhen you @-mention them in dialogue
docs/*.context.mdStable referencesReduce "noise" in chatsBy link/@-mention
docs/*.memory.mdProject memoryRecord decisions to avoid repeatsBy link/@-mention
.vscode/mcp.jsonMCP servers configurationAccess to GitHub/other toolsFor this workspace

Merge order of rules and settings: Prompt frontmatter → Chat mode → Repo/Path instructions → Defaults.


And now let's review each tool separately.

3.1. Global rules - .github/copilot-instructions.md

What it is: A Markdown file with short, verifiable rules: how to build, how to test, code style, and PR policies.

Why: So that all responses rely on a single set of standards (no duplication in each prompt).

How it works: The file automatically becomes part of the system context for all questions within the repository. No applyTo (more on that later) - it applies everywhere.

Minimal example:

md
1# Repository coding standards
2- Build: `npm ci && npm run build`
3- Tests: `npm run test` (coverage ≥ 80%)
4- Lint/Typecheck: `npm run lint && npm run typecheck`
5- Commits: Conventional Commits; keep PRs small and focused
6- Docs: update `CHANGELOG.md` in every release PR

Tips.

  1. Keep points short.
  2. Avoid generic phrases.
  3. Include only what can affect the outcome (build/test/lint/type/PR policy).

3.2. Path-specific instructions - .github/instructions/*.instructions.md

What it is: Modular rules with YAML frontmatter applyTo - glob patterns of files for which they are included.

Why: To differentiate standards for different areas (frontend/backend/CI). Allows controlling context based on the type of task.

How it works: When processing a task, Copilot finds all *.instructions.md whose applyTo matches the current context (files you are discussing/editing). Matching rules are added to the global ones.

Example:

md
1---
2applyTo: "apps/web/**/*.{ts,tsx},packages/ui/**/*.{ts,tsx}"
3---
4- React: function components and hooks
5- State: Zustand; data fetching with TanStack Query
6- Styling: Tailwind CSS; avoid inline styles except dynamic cases
7- Testing: Vitest + Testing Library; avoid unstable snapshots

Note.

  1. Avoid duplicating existing global rules.
  2. Ensure the glob actually targets the intended paths.

3.3. Chat modes - .github/chatmodes/*.chatmode.md

What it is: Config files that set the agent’s operational mode for a dialogue: a short description, the model (if needed) and a list of allowed tools.

Why: To separate work phases (planning/frontend/DBA/security) and restrict tools in each phase. This makes outcomes more predictable.

File structure:

md
1---
2description: "Plan - analyze code/specs and propose a plan; read-only tools"
3model: GPT-4o
4tools:
5  - "search/codebase"
6---
7In this mode:
8- Produce a structured plan with risks and unknowns
9- Do not edit files; output a concise task list instead

How it works:

  • The chat mode applies to the current chat in the IDE.
  • If you activate a prompt file, its frontmatter takes precedence over the chat mode (it can change the model and narrow tools).
  • Effective allowed tools: chat mode tools, limited by prompt tools and CLI --allow/--deny flags.

Management and switching:

  • In the IDE (VS Code):

    1. Open the Copilot Chat panel.
    2. In the top bar, choose the desired chat mode from the dropdown (the list is built from .github/chatmodes/*.chatmode.md + built-in modes).
    3. The mode applies only to this thread. To change - select another or create a new thread with the desired mode.
    4. Check the active mode in the header/panel of the conversation; the References will show the *.chatmode.md file.
  • In the CLI: (a bit hacky, better via prompts)

    • There is usually no dedicated CLI flag to switch modes; encode desired constraints in the prompt file frontmatter and/or via --allow-tool/--deny-tool flags.
    • You can instruct in the first line: “Use the i18n chat mode.” - if the version supports it, the agent may switch; if not, the prompt frontmatter will still enforce tools.
  • Without switching the mode: run a prompt with the required tools: in frontmatter - it will limit tools regardless of chat mode.

Diagnostics: if the agent uses "extra" tools or does not see needed ones - check: (1) which chat mode is selected; (2) tools in the prompt frontmatter; (3) CLI --allow/--deny flags; (4) References in the response (visible *.chatmode.md/*.prompt.md files).


3.4. Prompt files - .github/prompts/*.prompt.md

What it is: Scenario files for repeatable tasks. They consist of YAML frontmatter (config) and a body (instructions/steps/acceptance criteria). They are invoked in chat via /name or via CLI.

When to use: When you need a predictable, automatable process: PR review, test generation, implementing a feature from a spec, etc.

Frontmatter structure

  • description - short goal of the scenario.
  • mode - ask (Q&A, no file edits) · edit (local edits in open files) · agent (multistep process with tools).
  • model - desired model profile.
  • tools - list of allowed tools for the scenario (limits even what the chat mode allowed).

Execution algorithm (sequence)

  1. Where to run:

    • In chat: type /prompt-name and arguments in the message field.
    • In CLI: call copilot and pass the /prompt-name … line (interactive or via heredoc / -p flag).
  2. Context collection: Copilot builds the execution context in the following order: repo-instructionspath-instructions (applyTo)chat modefrontmatter prompt (the prompt frontmatter has the highest priority and can narrow tools/change the model).

  3. Parameter parsing (where and how):

    • In chat: parameters go in the same message after the name, for example: /security-review prNumber=123 target=apps/web.
    • In CLI: parameters go in the same /… line in stdin or after the -p flag.
    • Inside the prompt file they are available as ${input:name}. If a required parameter is missing, the prompt can ask for it textually in the dialog.
  4. Resolving tool permissions:

    • Effective allowed tools: chat mode tools, limited by prompt tools and CLI --allow/--deny flags.
    • If a tool is denied, the corresponding step is skipped or requires confirmation/change of policy.
  5. Executing steps from the prompt body: the agent strictly follows the Steps order, doing only what is permitted by policies/tools (searching the codebase, generating diffs, running tests, etc.). For potentially risky actions, it requests confirmation.

  6. Validation gates: at the end, the prompt runs checks (build/tests/lint/typecheck, output format checks). If a gate fails - the agent returns a list of issues and proposes next steps (without auto-merging/writing changes).

  7. Where the result appears (what and where you see it):

    • Main response - in the chat panel (IDE) or in stdout (CLI): tables, lists, textual reports, code blocks with diff.
    • File changes - in your working tree: in IDE you see a diff/suggested patches; in CLI files change locally (if allowed by tools).
    • Additional artifacts - e.g., a PR comment if GitHub tools are allowed and the prompt specifies it.

Output format and checks (recommended)

  • Always specify the output format (for example, table "issue | file | line | severity | fix").
  • Add validation gates: build/tests/lint/typecheck; require unified-diff for proposed changes; a TODO list for unresolved issues.

Example of a complete prompt file

md
1---
2mode: 'agent'
3model: GPT-4o
4tools: ['search/codebase']
5description: 'Implement a feature from a spec'
6---
7Goal: Implement the feature described in @docs/feature.spec.md.
8
9Steps:
101) Read @docs/feature.spec.md and produce a short implementation plan (bullets)
112) List files to add/modify with paths
123) Propose code patches as unified diff; ask before installing new deps
134) Generate minimal tests and run them (report results)
14
15Validation gates:
16- Build, tests, lint/typecheck must pass
17- Output includes the final diff and a TODO list for anything deferred
18- If any gate fails, return a remediation plan instead of "done"

Anti-patterns

  • Watered-down descriptions: keep description 1–2 lines.
  • Missing output format.
  • Too many tools: allow only what is needed (tools).

Quick start

  • Chat: /implement-from-spec
  • CLI: copilot <<<'/implement-from-spec' or copilot -p "Run /implement-from-spec"

3.5. Context files - specs/context/memory

What it is: Helper Markdown files (not special types) that you @-mention in dialogue/prompt. Typically stored as documentation.

  • docs/*.spec.md - precise problem statements (goal, acceptance, edge cases, non-goals).
  • docs/*.context.md - short references (API policies, security, UI styleguide, SLA).
  • docs/*.memory.md - "decision log" with dates and reasons so the agent does not return to old disputes.

Example:

md
1# Feature: Export report to CSV
2Goal: Users can export the filtered table to CSV.
3Acceptance criteria:
4- "Export CSV" button on /reports
5- Server generates file ≤ 5s for 10k rows
6- Column order/headers match UI; locale-independent values
7Edge cases: empty values, large numbers, special characters
8Non-goals: XLSX, multi-column simultaneous filters

3.6. MCP - .vscode/mcp.json

What it is: Configuration for Model Context Protocol servers (for example, GitHub MCP) which enable tools for the agent.

Why: So the agent can read PRs/issues, run tests, interact with DB/browser - within allowed permissions.

Example:

json
1{
2  "servers": {
3    "github-mcp": {
4      "type": "http",
5      "url": "https://api.githubcopilot.com/mcp"
6    }
7  }
8}

Security. Connect only trusted servers; use allow/deny tool lists in prompts/chat modes/CLI.


3.7. General context merge order and priorities (rules & tools)

  1. Instructions: copilot-instructions + all *.instructions.md with applyTo that match current paths. A specific instruction is added to the common context.
  2. Chat mode: restricts the toolset and (if needed) the model for the session.
  3. Prompt frontmatter: has the highest priority; can limit tools and override the model.
  4. Context: anything you @-mention is guaranteed to be considered by the model.

Diagnostics. Check the References section in outputs - it shows which instruction files were considered and which prompt was run.

3.8. Example: full i18n cycle with Goman MCP (create/update/prune)

Below is the exact process and templates on how to ensure: (a) when creating UI components localization keys are created/updated in Goman; (b) when removing components - unused entries are detected and (after confirmation) deleted.

Code snippets and frontmatter are in English.

3.8.1. MCP config - connect Goman

/.vscode/mcp.json

json
1{
2  "servers": {
3    "goman-mcp": {
4      "type": "http",
5      "url": "https://mcp.goman.live/mcp",
6      "headers": {
7        "apiKey": "<YOUR_API_KEY>",
8        "applicationid": "<YOUR_APPLICATION_ID>"
9      }
10    }
11  }
12}

3.8.2. Repo/Path rules - enforce i18n by default

/.github/instructions/frontend.instructions.md (addition)

md
1---
2applyTo: "apps/web/**/*.{ts,tsx}"
3---
4- All user-facing strings **must** use i18n keys (no hardcoded text in JSX/TSX)
5- Key naming: `<ui_component_area>.<name>` (e.g., `ui_button_primary.label`)
6- When creating components, run `/i18n-component-scaffold` and commit both code and created keys
7- When deleting components, run `/i18n-prune` and confirm removal of unused keys

3.8.3. Chat mode - limited i18n tools

/.github/chatmodes/i18n.chatmode.md

md
1---
2description: "i18n - manage localization keys via Goman MCP; enforce no hardcoded strings"
3model: GPT-4o
4tools:
5  - "files"
6  - "goman-mcp:*"
7---
8In this mode, prefer:
9- Creating/updating keys in Goman before writing code
10- Checking for existing keys and reusing them
11- Producing a table of changes (created/updated/skipped)

3.8.4. Prompt - scaffold component + keys in Goman

/.github/prompts/i18n-component-scaffold.prompt.md

md
1---
2mode: 'agent'
3model: GPT-4o
4tools: ['files','goman-mcp:*']
5description: 'Scaffold a React component with i18n keys synced to Goman'
6---
7Inputs: componentName, namespace (e.g., `ui.button`), path (e.g., `apps/web/src/components`)
8
9Goal: Create a React component and ensure all user-visible strings use i18n keys stored in Goman.
10
11Steps:
121) Plan the component structure and list all user-visible strings
132) For each string, propose a key under `${namespace}`; reuse if it exists
143) Using Goman MCP, create/update translations for languages: en, be, ru (values may be placeholders)
154) Generate the component using `t('<key>')` and export it; add a basic test
165) Output a Markdown table: key | en | be | ru | action(created/updated/reused)
17
18Validation gates:
19- No hardcoded literals in the produced .tsx
20- Confirm Goman actions succeeded (report tool responses)
21- Tests and typecheck pass

Example component code:

tsx
1import { t } from '@/i18n';
2import React from 'react';
3
4type Props = { onClick?: () => void };
5
6export function PrimaryButton({ onClick }: Props) {
7  return (
8    <button aria-label={t('ui.button.primary.aria')} onClick={onClick}>
9      {t('ui.button.primary.label')}
10    </button>
11  );
12}

3.8.5. Prompt - prune unused keys when removing components

/.github/prompts/i18n-prune.prompt.md

md
1---
2mode: 'agent'
3model: GPT-4o
4tools: ['files','goman-mcp:*']
5description: 'Find and prune unused localization keys in Goman after code deletions'
6---
7Inputs: pathOrDiff (e.g., a deleted component path or a PR number)
8
9Goal: Detect keys that are no longer referenced in the codebase and remove them from Goman after confirmation.
10
11Steps:
121) Compute the set of removed/renamed UI elements (scan git diff or provided paths)
132) Infer candidate keys by namespace (e.g., `ui.<component>.*`) and check code references
143) For keys with **zero** references, ask for confirmation and delete them via Goman MCP
154) Produce a Markdown table: key | status(kept/deleted) | reason | notes
16
17Validation gates:
18- Never delete keys that still have references
19- Require explicit confirmation before deletion
20- Provide a rollback list of deleted keys

3.8.6. Prompt - sync and check missing translations (optional)

/.github/prompts/i18n-sync.prompt.md

md
1---
2mode: 'agent'
3model: GPT-4o
4tools: ['files','goman-mcp:*']
5description: 'Sync new/changed i18n keys and check for missing translations'
6---
7Goal: Compare code references vs Goman and fill gaps.
8
9Steps:
101) Scan code for `t('...')` keys under provided namespaces
112) For missing keys in Goman - create them (placeholder text ok)
123) For missing languages - create placeholders and report coverage
134) Output coverage table: key | en | be | de | missing

4) How to use this (IDE and CLI)

4.1. In VS Code / other IDE

  • Open Copilot Chat - choose Agent/Edit/Ask in the dropdown.
  • For prompt files just type /file-name without extension (e.g. /security-review).
  • Add context using @-mentions of files and directories.
  • Switch chat mode (Plan/Frontend/DBA) when the task changes.

4.2. In Copilot CLI (terminal)

  • Example install: npm install -g @github/copilot → run copilot.
  • Interactively: “Run /implement-from-spec on @docs/feature.spec.md”.
  • Programmatically/in CI: copilot -p "Implement feature from @docs/feature.spec.md" --deny-tool shell("rm*").
  • Add/restrict tools with flags: --allow-all-tools, --allow-tool, --deny-tool (global or by pattern, e.g. shell(npm run test:*)).

4.3. Cookbook commands for CLI (chat modes and prompts)

Below are ready recipes. All commands should run from the repository root and respect your deny/allow lists.

A. Run a prompt file in an interactive session

bash
1copilot
2# inside the session (enter the line as-is)
3/security-review prNumber=123

B. Run a prompt file non-interactively (heredoc)

bash
1copilot <<'EOF'
2/security-review prNumber=123
3EOF

C. Pass prompt file parameters

bash
1copilot <<'EOF'
2/implement-from-spec path=@docs/feature.spec.md target=apps/web
3EOF

Inside the prompt you can read values as ${input:target} and ${input:path}.

D. Run a prompt with safe tool permissions

bash
1copilot --allow-tool "shell(npm run test:*)" \
2        --deny-tool  "shell(rm*)" \
3        <<'EOF'
4/security-review prNumber=123
5EOF

E. Use a chat mode (specialized mode) in the CLI

bash
1copilot
2# inside the session - ask to switch to the required mode and run the prompt
3Use the i18n chat mode.
4/i18n-component-scaffold componentName=PrimaryButton namespace=ui.button path=apps/web/src/components

If your client supports selecting the mode via a menu - choose i18n before running the prompt. If not - specify constraints in the prompt frontmatter (tools and rules in the prompt body).

F. Send file links/diffs as context

bash
1copilot <<'EOF'
2Please review these changes:
3@apps/web/src/components/PrimaryButton.tsx
4@docs/feature.spec.md
5/security-review prNumber=123
6EOF

G. Change the model for a specific run

We recommend specifying the model in the prompt frontmatter. If supported, you can also pass a model flag at runtime:

bash
1copilot --model GPT-4o <<'EOF'
2/implement-from-spec
3EOF

H. i18n cycle with Goman MCP (CHAT)

Run sequentially in a chat thread:

text
1/i18n-component-scaffold componentName=PrimaryButton namespace=ui.button path=apps/web/src/components
2/i18n-prune pathOrDiff=@last-diff
3/i18n-sync namespace=ui.button

What you get:

  • resulting tables/reports in the chat panel;
  • code changes in your working tree (IDE shows diffs);
  • no CLI commands for Goman MCP are required here.

5) Context engineering: how not to "dump" excess context

  1. Split sessions by phases: Plan → Implementation → Review/Tests. Each phase has its own Chat Mode.
  2. Attach only necessary instructions: use path-specific *.instructions.md instead of dumping everything.
  3. Project memory: record short ADRs in project.memory.md - this reduces agent "forgetting" between tasks.
  4. Context helpers: keep frequent references (API/security/UI) in *.context.md and link to them from prompt files.
  5. Focus on the task: in prompt files always state the goal, steps and output format (table, diff, checklist).

6) Security and tool management

  • Require explicit confirmation before running commands/tools. In CI use --deny-tool by default and add local allow lists.
  • Permission patterns: allow only what is necessary (shell(npm run test:*), playwright:*), deny dangerous patterns (shell(rm*)).
  • Secrets: never put keys in prompts or instructions; use GitHub Environments or local secret managers and .env with .gitignore.
  • Any MCP - only from trusted origins; review the code/config before enabling.
  • Patch checks: require unified-diff and explanations in prompt files - this makes review easier.

7) CI/CD recipe (optional example)

Ensure "everything builds": run Copilot CLI in a dry/safe mode to produce a comment for the PR.

yaml
1# .github/workflows/ai-review.yml
2name: AI Review (Copilot CLI)
3on:
4  pull_request:
5    types: [opened, synchronize, reopened]
6
7jobs:
8  ai_review:
9    runs-on: ubuntu-latest
10    permissions:
11      contents: read
12      pull-requests: write
13    steps:
14      - uses: actions/checkout@v4
15      - uses: actions/setup-node@v4
16        with:
17          node-version: 22
18      - name: Install Copilot CLI
19        run: npm install -g @github/copilot
20      - name: Run security review prompt (no dangerous tools)
21        env:
22          PR: ${{ github.event.pull_request.number }}
23        run: |
24          copilot -p "Run /security-review with prNumber=${PR}" \
25            --deny-tool shell("rm*") --deny-tool shell("curl*") \
26            --allow-tool shell("npm run test:*") \
27            --allow-tool "github:*" \
28            > ai-review.txt || true
29      - name: Comment PR with results
30        if: always()
31        run: |
32          gh pr comment ${{ github.event.pull_request.number }} --body-file ai-review.txt

Tip: keep tight deny/allow lists; do not give the agent "full freedom" in CI.


8) Small scenarios and tips that might be useful

  • From idea to PR: /plan - discuss the plan - /implement-from-spec → local tests - PR - /security-review.
  • Maintenance: /refactor-slice for local improvements without behavior changes.
  • Tests: /test-gen for new modules + manual additions for edge cases.
  • Gradual rollout: start with 1–2 prompt files and one chat mode; expand later.

9) Quality checks (validation gates)

In each prompt file, fix "what counts as done":

  • Output format: risk table, unified-diff, checklist.
  • Automated checks: build, unit/integration tests, lint/typecheck.
  • Manual check: "OK to merge?" with rationale and residual risks.

10) Anti-patterns and hacks

  • Anti-pattern: one huge instructions.md. Prefer multiple *.instructions.md with applyTo.
  • Anti-pattern: generic words instead of rules. Prefer concrete commands/steps.
  • Anti-pattern: running dangerous shell commands without a gate. Use deny/allow and manual confirmation.
  • Anti-pattern: forgetting specs/memory. Maintain feature.spec.md and project.memory.md.
  • Anti-pattern: mixing tasks in one session. Create a Chat Mode per phase.

11) Implementation checklist

  1. Add .github/copilot-instructions.md (at least 5–8 bullets about build/tests/style).
  2. Create 1–2 *.instructions.md with applyTo (frontend/backend or workflows).
  3. Add plan.chatmode.md and one prompt (for example, implement-from-spec.prompt.md).
  4. Create docs/feature.spec.md and docs/project.memory.md.
  5. Include MCP (GitHub MCP at minimum) via .vscode/mcp.json.
  6. Run the workflow in VS Code: /implement-from-spec - verify - PR.
  7. (Optional) Add a simple AI review in CI via Copilot CLI with strict deny/allow lists.

12) Questions and answers (FAQ)

Q: How to ensure Copilot "sees" my instructions? A: Check the response's summary/References; also keep rules short and concrete.

Q: Can I pass parameters dynamically into prompt files? A: Yes, typically via placeholder variables (like ${prNumber}) or simply via the text query when running /prompt in chat.

Q: Where to store secrets for MCP? A: In GitHub Environments or local secret managers; not in .prompt.md/.instructions.md.

Q: Which to choose: Chat Mode vs Prompt File? A: Chat Mode defines the "frame" (model/tools/role). Prompt File is a "scenario" within that frame.


13) Next steps

  • Add a second prompt for your most frequent manual process.
  • Make project.memory.md mandatory after all architecture decisions.
  • Gradually move collective knowledge into *.context.md and reference it from prompt files.

Appendix A - Quickstart templates

All keys, paths, and flags match the docs (Oct 28, 2025).

/.github/copilot-instructions.md - repository-wide rules

md
1# Repository coding standards
2- Build: `npm ci && npm run build`
3- Tests: `npm run test` (coverage ≥ 80%)
4- Lint/Typecheck: `npm run lint && npm run typecheck`
5- Commits: Conventional Commits; keep PRs small and focused
6- Docs: update `CHANGELOG.md` in every release PR

/.github/instructions/frontend.instructions.md - path-specific rules

md
1---
2applyTo: "apps/web/**/*.{ts,tsx},packages/ui/**/*.{ts,tsx}"
3---
4- React: function components and hooks
5- State: Zustand; data fetching with TanStack Query
6- Styling: Tailwind CSS; avoid inline styles except dynamic cases
7- Testing: Vitest + Testing Library; avoid unstable snapshots

/.github/instructions/backend.instructions.md - path-specific rules

md
1---
2applyTo: "services/api/**/*.{ts,js},packages/server/**/*.{ts,js}"
3---
4- HTTP: Fastify; version APIs under `/v{N}`
5- DB access: Prisma; migrations via `prisma migrate`
6- Security: schema validation (Zod), rate limits, audit logs
7- Testing: integration tests via `vitest --config vitest.integration.ts`

/.github/instructions/actions.instructions.md - GitHub Actions

md
1---
2applyTo: ".github/workflows/**/*.yml"
3---
4- Keep jobs small; reuse via composite actions
5- Cache: `actions/setup-node` + built-in cache for npm/pnpm
6- Secrets: only through GitHub Environments; never hardcode

/.github/chatmodes/plan.chatmode.md - custom chat mode

md
1---
2description: "Plan - analyze code/specs and propose a plan; read-only tools"
3model: GPT-4o
4tools:
5  - "search/codebase"
6---
7In this mode:
8- Produce a structured plan with risks and unknowns
9- Do not edit files; output a concise task list instead

/.github/prompts/security-review.prompt.md - prompt file

md
1---
2mode: 'agent'
3model: GPT-4o
4tools: ['search/codebase']
5description: 'Perform a security review of a pull request'
6---
7Goal: Review PR ${input:prNumber} for common security issues.
8
9Checklist:
10- Authentication/authorization coverage
11- Input validation and output encoding (XSS/SQLi)
12- Secret management and configuration
13- Dependency versions and known CVEs
14
15Output:
16- A Markdown table: issue | file | line | severity | fix
17- If trivial, include a unified diff suggestion

/.github/prompts/implement-from-spec.prompt.md - prompt file

md
1---
2mode: 'agent'
3model: GPT-4o
4tools: ['search/codebase']
5description: 'Implement a feature from a spec'
6---
7Your task is to implement the feature described in @docs/feature.spec.md.
8
9Steps:
101) Read @docs/feature.spec.md and summarize the plan
112) List files to add or modify
123) Propose code changes; ask before installing new dependencies
134) Generate minimal tests and run them
14
15Validation gates:
16- Build, tests, lint/typecheck must pass
17- Provide a TODO list for anything deferred

/.github/prompts/refactor-slice.prompt.md - prompt file

md
1---
2mode: 'agent'
3model: GPT-4o
4description: 'Refactor a specific code slice without changing behavior'
5---
6Goal: Improve readability and reduce side effects in @src/feature/* while keeping behavior unchanged.
7Criteria: fewer side effects, clearer structure, all tests pass.

/.github/prompts/test-gen.prompt.md - prompt file

md
1---
2mode: 'agent'
3model: GPT-4o-mini
4description: 'Generate tests for a given file/module'
5---
6Ask the user to @-mention the target file; generate unit/integration tests and edge cases.

/docs/feature.spec.md - spec skeleton

md
1# Feature: Export report to CSV
2Goal: Users can export the filtered table to CSV.
3Acceptance criteria:
4- "Export CSV" button on /reports
5- Server generates file ≤ 5s for 10k rows
6- Column order/headers match UI; locale-independent values
7Edge cases: empty values, large numbers, special characters
8Non-goals: XLSX, multi-column simultaneous filters

/.vscode/mcp.json - minimal MCP config

json
1{
2  "servers": {
3    "github-mcp": {
4      "type": "http",
5      "url": "https://api.githubcopilot.com/mcp"
6    }
7  }
8}

Appendix B - Operational extras (CLI & CI examples)

These examples complement Appendix A; they cover runtime/automation usage and do not duplicate templates above.

Copilot CLI - safe tool permissions (interactive/CI)

bash
1# Start an interactive session in your repo
2copilot
3
4# Allow/deny specific tools (exact flags per GitHub docs)
5copilot --allow-tool "shell(npm run test:*)" --deny-tool "shell(rm*)"
6
7# Run a prompt file non-interactively (example)
8copilot <<'EOF'
9/security-review prNumber=123
10EOF

GitHub Actions - comment review results on a PR

yaml
1name: AI Security Review (Copilot CLI)
2on:
3  pull_request:
4    types: [opened, synchronize, reopened]
5
6jobs:
7  review:
8    runs-on: ubuntu-latest
9    permissions:
10      contents: read
11      pull-requests: write
12    steps:
13      - uses: actions/checkout@v4
14      - uses: actions/setup-node@v4
15        with:
16          node-version: 22
17      - name: Install Copilot CLI
18        run: npm install -g @github/copilot
19      - name: Run security review prompt
20        env:
21          PR: ${{ github.event.pull_request.number }}
22        run: |
23          copilot --allow-tool "shell(npm run test:*)" --deny-tool "shell(rm*)" <<'EOF'
24          /security-review prNumber=${PR}
25          EOF
26      - name: Post results
27        run: |
28          gh pr comment ${{ github.event.pull_request.number }} --body "Copilot review completed. See artifacts/logs for details."

Sources

Custom chat modes in VS Code

Use MCP servers in VS Code

Adding repository custom instructions for GitHub Copilot

How to build reliable AI workflows with agentic primitives and context engineering

🙌 PS:

Thank you for reading to the end! If the material was useful, we would be very glad if you:

  • 💬 Leave a comment or question,
  • 📨 Suggest an idea for the next article,
  • 🚀 Or simply share it with friends!

Technology becomes more accessible when it is understood. And you have already made the first important step 💪

See you in the next article! Thank you for your support!

AI
Copilot

head

1. First Encounter: Entering Vibe Coding

Not long ago, I accidentally stumbled upon a video where a developer created a whole application using artificial intelligence in just a few minutes. The service was called bolt.new. And, as you might guess, I decided to try this beast myself.

The first impressions were, to be honest, very inspiring. You give a command — you get code. Like in the dreams of a young developer: without meetings, reviews, and evening calls from the project manager. But, as in fairy tales, everything was going well… until a certain point.

When the need arose to change something in the code, problems began. Bolt quickly “eats” tokens, and with finding and fixing bugs, to put it mildly, it has issues. As a result, it is a great tool for a quick MVP, but not for support or scaling. Thus began my vibe coding epic.

2. Plan B: Copilot and Heartache

After the first disappointment, I moved the development to a local computer. Turned on VS Code, connected Copilot — and back to the fight. By default, it works with the GPT-4.1 model. For simple tasks, it writes, comments, helps. Everything looks good.

But when I asked it to do something serious — like analyze several files and work out logic — everything started to fall apart. Errors, inconsistency, a code cocktail of glitches. It became clear: the problem is in the limited context. In "agent" mode, the model can't handle a large amount of data.

I switched to Claude Sonnet 3.7 — and this was better. Hope appeared. Then I found out that there is Sonnet 4 — and that's when the magic really started: the model wrote code, checked linters, and corrected its own mistakes, ran scripts in the terminal… But — yes, you guessed it — the tokens ran out.... And the world turned gray again.

3. Cursor: Second Attempt at Magic

After another fiasco, when the tokens ran out and the soul was empty, I didn't give up. I decided: “Enough of this mockery!” And went to download Cursor. Bought the PRO version and immediately — into the fray.

Cursor, in combination with Sonnet-4 + Max (extends context), worked fine. Not as glossy as Copilot, but effective enough. Started creating an app: one file, another, a third… Everything was going well. Until it became clear — even with Max, the model starts to “forget”: loses context, deletes important parts of the code, forgets the logic of the previous iteration. But for now — a worrying, but not critical signal.

It became critical… the next day. You won't believe it — the tokens ran out again! I used them up in two days on a 10-screen app. It felt like watching my salary just turn into emptiness. But I didn't give up! Bought Cursor Pro+ and continued (salary - goodbye). Because vibe coding — it's like that.

4. Moment of Truth: Architecture or Chaos

When the app began to look like something alive, I decided to conduct a code review myself. And — horror! Each screen was a separate “planet,” no templates, all elements duplicated, as if some mysterious force decided to show me what hell for a UI designer is. Each separate screen, although it was similar to the previous one, wasn't.

Ordinary things — like offsets or margins — had to be corrected manually on each screen. Asking the model to do it automatically was pointless. It corrected in one place and simultaneously broke in another. At this moment, I realized my main mistake: I didn't work out the app's architecture from the very beginning.

In essence, I continued to work as a “senior training a junior model on the fly”: “No! This is a separate component! Here — we'll take out styles! Don't put four components in one file!” (there were files over 1000 lines). And this, of course, knocked me off track a bit.

5. Recommendations: How Not to Drown in Vibe

After all these twists and turns, I decided — I need to share my experience. Here are some tips for those who want to vibe-code but not get bogged down:

    1. Create a structure. Immediately determine what the folders and files will look like.
    1. Check styles. After each iteration, monitor consistency.
    1. Extract components. If something repeats — into a separate file! But specify exactly where.
    1. One task — one prompt. Otherwise, you'll get a mess.
    1. Patience is key. Sometimes the model sees a mistake and fixes it itself. Don't rush.
    1. Commit is your friend. If something works — fix it. The next step can break everything.
    1. Reset context. Starting a new feature? Create a new chat. And don't forget about git commit!

And most importantly, don't lose control over money. These beasts love to eat tokens like a hungry student — pizza. Before you know it — it's all over. Again.

6. Conclusions: Without Professionals — No Magic

After all that's been experienced, one thing became completely clear: at the moment, without professionals, artificial intelligence is not capable of creating truly high-quality products. Yes, AI significantly speeds up development, performs a lot of routine work, helps "splash out" an MVP in record time. But it all works only on the condition that there is a person nearby who knows what they are doing.

Without proper management, models lose context, repeat themselves, make "random" architectural decisions — and the result resembles code monkeys who might, with some probability, write a book. Therefore, as paradoxical as it may sound, to not write code — you still need to understand it well.

In the future, obviously, we will write less and less code manually. But to truly efficiently use AI, we must learn to think like engineers and manage the process — not just "talk to the bot."

And yes, damn, the tokens ran out again. But now at least you'll know what to do next.

Vibe
coding
AI

space

Hello! Folks!

We continue our series of articles about AI (Artificial Intelligence) and how to use it. Today's article will be more theoretical. We will try to figure out what models are, their types, how to use them, and what features they have.

I hope you find it interesting, so let's go!

And so, as usual (as it seemed in school), let's understand the concept of AI models.


Artificial Intelligence — It's Not Magic, It's Math on Steroids

When you hear "AI model," it might seem like we're talking about something like a robot that thinks like a human. But in reality, it's much simpler (and more complex at the same time). An AI model is a mathematical construct that has been trained on data to later predict, classify, or even generate new texts, images, or codes.

For example, if you use a translator, get recommendations on Spotify, or filter spam in your mail, there's one or more of these models at work. But it's important to understand: a model is not the algorithm itself, but the result of its application to data.

The process usually looks like this: you take an algorithm (e.g., gradient boosting, neural network, or SVM), "feed" it data, and get a model. This model can then independently make decisions, such as determining whether there's a cat in a photo or if it's just a fluffy blanket.

In this article, we'll delve deeper: how different models differ, why foundational models (like GPT, Claude, etc.) are needed, and why you can't make even a simple chatbot today without powerful GPUs.


How an AI Model "Remembers" Information: Vectors, Spaces, and Mathematical Magic

Okay, we already know that a model is the result of training on data. But a logical question arises: how does it store all this? Does it not store it like Google Docs?

No. An AI model remembers nothing "human-like." It doesn't know that a "cat" meows or that "pizza" is tasty. Instead, it envisions the world through vectors—mathematical objects consisting of sets of numbers. Roughly speaking, each word, image, request, or even concept is converted into a digital code—a set of values in a space with hundreds or thousands of dimensions.


🧠 Example:

text
1"cat" = [0.12, -0.98, 3.45, …]

When a model "thinks" about the word "cat" (usually it creates vectors for entire expressions), it doesn't work with text but with its vector representation. The same goes for the words "dog," "fluffy," "purrs," etc.


vectors

(Vector visualization. Top left - vectors of texts about prompting. Bottom right - vectors of texts about the Belarusian language.)


Interestingly, vectors close in this multidimensional space represent concepts that are close in meaning. For example, if the vector "cat" is near "dog," it means the model has learned to see something common between them (animals, domestic, fluffy).

How Are Vectors Compared?

The model measures cosine similarity or Euclidean distance between vectors. Simply put, it calculates how "parallel" or "close" vectors are to each other. The smaller the distance or the larger the cosine, the stronger the connection between concepts.

For example, in large language models (LLM) like GPT:

  • "Minsk" + "Belarus" ≈ "Paris" + "France"
  • "Book" + "read" - "paper" ≈ "electronic"

And Why Is This Needed?

Everything, from understanding user questions to generating answers, happens through manipulating these vectors. If you write in chat: "suggest a movie like Inception," the model searches for the vector "Inception," finds its neighbors in the space (e.g., "Interstellar," "Tenet"), and based on this, generates a recommendation.

Not All AI Models Are the Same: How They Are Divided

The variety of models is impressive. You can find models that analyze maps or detect diseases from images.

Let's try to typify AI models.

Starting with the fact that models can be divided by the types of data they work with:

🧠 Language Models (LLM)

Work with text: understand, continue, analyze, generate. Modern examples:

  • Qwen 3 — supports both dense and Mixture-of-Experts architectures.
  • Gemma 3 — compact, efficient, works even on 1 GPU.
  • DeepSeek-R1 — focuses on reasoning and logic.
  • LLaMA 3.3 / 4 — new open-source LLM.

👁 Visual Models (Computer Vision / Multimodal Vision)

Work with images: recognize, analyze, generate.

  • LLaMA 4 Vision — understands images and can answer questions based on visual context.
  • Gemma Vision — scalable visual model from Google.
  • DALL-E
🎤 Audio and Speech Models

For speech recognition, voice generation, emotions, etc. (Currently, there are few open-source competitors to Whisper or VALL-E, but they are expected.)

🔀 Multimodal

Capable of processing multiple types of data: text + image, text + audio, etc. For example, LLaMA 4 — combines a language model with the ability to analyze images and supports agents (more on them in future articles).


🧰 Models Can Also Be Divided by Functionality:
  1. Task-specific models Finely tuned for a single task (e.g., SQL generation, automatic medical analysis).
  2. General-purpose models Perform multiple tasks without specific adaptation. These include all modern flagship models: Qwen 3, LLaMA 3.3, Gemma 3, DeepSeek-R1.
  3. Tool-augmented models Can use tools like calculators, search, databases, even other AIs. Example: GPT-4 Turbo with tools, LLaMA 4 Agents, DeepSeek Agent (based on R1).

🏋️ Models Can Be Divided by Size Based on the Number of Parameters They Support (B - Billions):
  • Tiny / Small (0.6B – 4B parameters) — works on local devices.
  • Medium (7B – 14B) — requires GPU, works stably.
  • Large (30B – 70B) — for data centers or enthusiasts with clusters.
  • Ultra-large (100B – 700B+) — requires special equipment.

Let's Discuss Features of Models You Might Not Know.

🧠 A Model Is Not Human. It Remembers Nothing

One might think that if a model answers your questions taking into account previous ones, it "remembers" the conversation. But that's an illusion. In reality, a model is a mathematical function that has no memory in the human sense.


Context Is Where All Memory "Lives"

Every time you send a request to a model, a so-called context is transmitted with it—the text of previous conversations, documents, instructions. It's like giving a person a cheat sheet before asking something. And if the next request doesn't contain the previous text, the model "forgets" everything.

📌 The model doesn't store any information after responding. Everything it "knows" is what you provided at the current moment.


Why Is This Important?

Because it means the model cannot remember users, the context of the conversation, or events. The model only remembers the data it was trained on during training or fine-tuning. If it seems like it "remembers," it's not the model's merit but the system around it that:

  • stores context,
  • dynamically loads it,
  • or uses vector databases or other tools to retrieve needed information.
🧊 The Model Is "Frozen Math"

To simplify: a model is a function that converts input (request + context) into output (response). And that's it. There's no internal dynamics that change between calls. (In simpler models, you can notice that the same prompt will yield the same answer)

It's like a calculator: you enter 2 + 2 and get 4. If you want to get 4 again, you need to enter 2 + 2 again.

Everything related to "memory," "personal history," "recall," are architectural add-ons. For example, agents with "memory" work like this:

  1. The entire dialogue is stored externally (in a database, file, or vector system).
  2. For each new request, the agent retrieves relevant fragments from "memory."
  3. It adds them to the context and only then passes everything to the model.

The model itself doesn't even "know" this text is from memory—for it, it's just another part of the input.

🙅‍♂️ Can a Model Learn During a Conversation?

No. A typical AI model (including GPT, Claude, LLaMA) doesn't change itself during operation. For it to "learn" something, retraining or fine-tuning must occur, and that's a separate process that doesn't happen during a chat.

Even if the model answers incorrectly 100 times, it will continue to do the same until you create a new model or change the context.

📏 Context Isn't Infinite: Why a Model Can't "Read Everything"

One of the most common user misconceptions about AI is the illusion that a model can work with "the whole book," "the whole database," or "a lot of documents at once." But that's not true. Models have a strict limit on the context size they can process at one time.


What Is Context in a Technical Sense?

Context is the entire set of information you provide to the model in one call: your request, instructions, documents, dialogue history, etc. It's not just "text" but a set of tokens—special units to which text is broken down for processing.

Example:

The word "cat" is 1 token. The word "automobile" might contain 2-3 tokens. The English "The quick brown fox jumps over the lazy dog." is 9 tokens.

How Many Tokens Can Modern Models "Hold"?
  1. GPT-3.5 - 4,096 tokens
  2. GPT-4 - 8,192 - 32,000 tokens
  3. GPT-4o - up to 128,000 tokens
  4. Claude 3 - up to 200,000 tokens
  5. LLaMA 3 - usually - 8k - 32k tokens

128,000 tokens is about 300 pages of text. Seems like a lot? But it quickly runs out when you add, for example, technical documentation or code.

🧨 What Happens If You Provide Too Much?

  1. The prompt won't fit—the model will refuse to process it or cut off part (usually the beginning) and some data will be lost.
  2. If you supply overly long texts, important parts may be "pushed" out of visibility.
  3. Lower accuracy—even if all information fits, the model can "get lost" in volume and miss important parts.

Why Not Just Make "Infinite Context"?

The problem is that all tokens are processed together—and the more there are, the:

  • more memory is required on the GPU (Processors that process data),
  • more time computation takes,
  • worse the attention (attention) works: the model "spreads out" and doesn't understand what to focus on.

How to Work with Large Data?

If the text doesn't fit into the context, there are solutions:

  1. 🔍 RAG (Retrieval-Augmented Generation) — selecting the most relevant pieces before each request
  2. 📚 Vector search — finding text similar in meaning
  3. 🪓 Splitting — you provide information in parts
  4. 🧠 Agent with memory — uses an external database to "recall" previous material

✨ Let's Summarize…

Phew! If you've read this far—you're almost an expert 😎 Let's briefly go over the main points again:

  • ✅ An AI model isn't magic; it's math that has learned from data
  • ✅ All a model's "knowledge" is not stored like human memory but in vectors
  • ✅ The model doesn't remember you—all "memory" lives in context
  • ✅ There are many different models: by data type, functionality, and size
  • ✅ Context is limited—and that's not a bug, but a feature you need to work with
  • ✅ "Teaching a model in conversation"—is still not quite reality

So, if it seems like AI "understands" or "remembers," remember that you're actually facing a very smart calculator, not a virtual Jan Bayan with amnesia 🙂


AI
context
RAG

goman

Hello everyone!

As you may have noticed, I've been writing fewer articles lately, and my overall activity has slightly decreased.

The reason is that I've been working on my own project, designed for managing localizations and translations. Today, I'd like to tell you a bit about it.

What is this service and why is it needed

For a long time, I've been involved in development and often encountered the need to support multiple localizations and languages. Usually, this looked like manually writing keys in the code and adding translations to localization files. This always took a lot of time and effort.

Therefore, I decided to create my own service that allows managing localizations and content directly through VS Code, Cursor. At the moment, I've managed to make it so that agents (Copilot and others) can connect to the service via MCP and add, delete, edit localizations.

The service itself is not limited to MCP: it also supports AI translation, editing, and other functions.

AI Translator

The AI Translator allows generating translations considering pre-prepared prompts, which significantly improves the result quality. In addition, the service can consider previous translations when creating new ones, which can be very useful. Prompts support versioning and can be switched before use.

Export and Collaboration

All created translations can be exported in JSON format in various structures (which can be selected before downloading). The service allows multiple users to work on localizations simultaneously, sharing access with other team members.

Additional Features

Besides localizations, the service has a simple prompt manager and SDK that allows remotely running different models and services and observing the results. However, this will be covered in a separate article.

About backups

Currently in development is the ability to version localizations so they can be restored in case of problems. But even now, the server makes database backups every 6 hours, which are stored for one week.

About beta testing and further development

The service is currently in beta testing. We invite everyone interested to join, try it in action, and share their impressions.

Your feedback is especially important to us: all sensible advice and suggestions will be carefully considered and taken into account in further development.

In the future, there will be different tariff plans, but for now, the main goal is to get feedback, make the service convenient and useful for the community.

Plans

In the near future, I plan to work more closely on AI translation capabilities. Support for new editors, besides Cursor and VS Code, will be added. A note function will be available to everyone working on translations.

For the Belarusian-speaking community

I also plan to offer good discounts for the Belarusian-speaking community. You can always request a discount or an increase in limits.

📩 You can contact us via email goman.live.service@gmail.com

💬 Or join our Discord server

And a first small promo about how MCP works with the service