chore(skills): sync qmd skill to upstream tobi/qmd@main

author Jérôme Benoit <jerome.benoit@sap.com>

Wed, 10 Jun 2026 22:24:31 +0000 (00:24 +0200)

committer Jérôme Benoit <jerome.benoit@sap.com>

Wed, 10 Jun 2026 22:24:31 +0000 (00:24 +0200)
author Jérôme Benoit <jerome.benoit@sap.com>
Wed, 10 Jun 2026 22:24:31 +0000 (00:24 +0200)
committer Jérôme Benoit <jerome.benoit@sap.com>
Wed, 10 Jun 2026 22:24:31 +0000 (00:24 +0200)
diff --git a/.agents/skills/qmd/SKILL.md b/.agents/skills/qmd/SKILL.md

index 1f5e3e68e9834bc3a4155017e022b01e06f7bf6f..6a889dfb0ddfced7f0b35c700f7732d77af0121c 100644 (file)
--- a/.agents/skills/qmd/SKILL.md
+++ b/.agents/skills/qmd/SKILL.md
@@ -1,146 +1,298 @@
  ---
  name: qmd
-description: Search markdown knowledge bases, notes, and documentation using QMD. Use when users ask to search notes, find documents, or look up information.
+description: Search local markdown knowledge bases, notes, docs, and wikis with QMD. Use when users ask to find notes, retrieve documents, inspect a wiki, answer from indexed markdown, or set up QMD access.
  license: MIT
  compatibility: Requires qmd CLI or MCP server. Install via `npm install -g @tobilu/qmd`.
  metadata:
    author: tobi
-  version: '2.0.0'
+  version: '2.2.0'
  allowed-tools: Bash(qmd:*), mcp__qmd__*
  ---
  
-# QMD - Quick Markdown Search
+# QMD - Query Markdown Documents
  
-Local search engine for markdown content.
+## How search works
  
-## Status
+QMD searches local markdown collections: notes, docs, wikis, transcripts, and
+project knowledge bases. Use it before web search when the answer may already be
+in indexed local files.
  
-!`qmd status 2>/dev/null || echo "Not installed: npm install -g @tobilu/qmd"`
+The workflow is always:
  
-## MCP: `query`
+1. Search for candidate documents.
+2. Retrieve the full source with `qmd get` or `qmd multi-get`.
+3. Answer from retrieved text, citing paths or docids.
  
-```json
-{
-  "searches": [
-    { "type": "lex", "query": "CAP theorem consistency" },
-    { "type": "vec", "query": "tradeoff between consistency and availability" }
-  ],
-  "collections": ["docs"],
-  "limit": 10
-}
+Do not answer from snippets alone when the user needs facts, decisions, quotes,
+or nuance. Snippets are only leads.
+
+Typical loop:
+
+```bash
+qmd search "merchant reality support interviews" -n 5
+# leads: #abc123 concepts/customer-proximity.md; #def432 sources/merchant-call.md
+qmd multi-get "#abc123,#def432" --format md
  ```
  
-### Query Types
+**Default to structured `qmd query` with `intent:`, `lex:`, `vec:`, and `hyde:`
+fields that you write yourself.** You are a better query expander than the
+built-in model: you know the user's actual goal, the domain vocabulary, and the
+nearby-but-wrong concepts to avoid. Do not just paste the user's words into
+`qmd query "..."` and hope the expansion model guesses right — supply the
+`intent:` and craft the lexical and semantic terms deliberately (see
+[Pick the right search mode](#pick-the-right-search-mode)).
+
+When reporting what you retrieved, a compact note is enough; do not paste whole
+files unless needed:
+
+```text
+Retrieved:
+- #abc123 concepts/customer-proximity.md
+- #def432 sources/merchant-call.md
+```
  
-| Type   | Method | Input                                       |
-| ------ | ------ | ------------------------------------------- |
-| `lex`  | BM25   | Keywords — exact terms, names, code         |
-| `vec`  | Vector | Question — natural language                 |
-| `hyde` | Vector | Answer — hypothetical result (50-100 words) |
+## Pick the right search mode
  
-### Writing Good Queries
+Use **BM25 lexical search** when you know exact words, titles, names, code
+symbols, or rare phrases:
  
-**lex (keyword)**
+```bash
+qmd search "cockpit OKR Goodhart" -n 10
+qmd search '"AI Before Headcount"' -c concepts -n 5
+```
  
-- 2-5 terms, no filler words
-- Exact phrase: `"connection pool"` (quoted)
-- Exclude terms: `performance -sports` (minus prefix)
-- Code identifiers work: `handleError async`
+Use **`qmd query` with structured fields** when the user describes an idea
+indirectly, uses different wording than the source, or needs conceptual recall.
+**This is the default mode — write the fields yourself rather than leaning on
+query expansion.** Combine exact anchors with semantic recall:
  
-**vec (semantic)**
+```bash
+qmd query $'intent: Find the concept note about metrics as instruments without letting OKRs replace judgment.\nlex: cockpit instruments OKR Goodhart metrics judgment\nvec: data informed not metric driven product judgment\nhyde: A concept note says metrics are useful like cockpit instruments, but leaders should remain data-informed rather than metric-driven because OKRs and dashboards can Goodhart product judgment.'
+```
  
-- Full natural language question
-- Be specific: `"how does the rate limiter handle burst traffic"`
-- Include context: `"in the payment service, how are refunds processed"`
+Structured query fields (you author each one — do not delegate this to the
+expansion model):
  
-**hyde (hypothetical document)**
+- `intent:` states what you are trying to find **and what to avoid**. Always
+  supply this. It steers ranking away from nearby-but-wrong concepts.
+- `lex:` exact terms, aliases, titles, code symbols, and rare words you expect
+  in the source. This is your own keyword expansion.
+- `vec:` paraphrases the idea in natural language, in source-like wording.
+- `hyde:` describes the document or answer that would satisfy the request.
  
-- Write 50-100 words of what the _answer_ looks like
-- Use the vocabulary you expect in the result
+You do not need all four every time, but you should almost always write at least
+`intent:` plus one of `lex:`/`vec:`. A bare `qmd query "the user's sentence"`
+throws away the context only you have and relies on the built-in expander to
+reconstruct it — prefer the structured form.
  
-**expand (auto-expand)**
+If you genuinely have nothing to expand (a single rare token, a verbatim phrase),
+that is a job for `qmd search`, not bare `qmd query`:
  
-- Use a single-line query (implicit) or `expand: question` on its own line
-- Lets the local LLM generate lex/vec/hyde variations
-- Do not mix `expand:` with other typed lines — it's either a standalone expand query or a full query document
+```bash
+qmd query --format json --explain $'intent: ...\nlex: ...\nvec: ...'  # inspect ranking
+```
  
-### Intent (Disambiguation)
+If `qmd query` is slow or model/GPU setup fails, fall back to `qmd search` with
+better lexical terms.
  
-When a query term is ambiguous, add `intent` to steer results:
+## Retrieve sources
  
-```json
-{
-  "searches": [{ "type": "lex", "query": "performance" }],
-  "intent": "web page load times and Core Web Vitals"
-}
+Search results include docids like `#abc123` and `qmd://...` paths. Fetch them:
+
+```bash
+qmd get "#abc123"
+qmd get qmd://concepts/ai-before-headcount.md
+qmd multi-get "#abc123,#def432" --format md
+qmd multi-get 'concepts/{ai-before-headcount.md,data-informed-not-metric-driven.md}' --format md
+qmd multi-get 'sources/podcast-2025-*.md' -l 80
  ```
  
-Intent affects expansion, reranking, chunk selection, and snippet extraction. It does not search on its own — it's a steering signal that disambiguates queries like "performance" (web-perf vs team health vs fitness).
+Use `multi-get` when comparing several hits or gathering context across pages.
  
-### Combining Types
+### Output is line-numbered and carries the docid — cite both
  
-| Goal                  | Approach                                              |
-| --------------------- | ----------------------------------------------------- |
-| Know exact terms      | `lex` only                                            |
-| Don't know vocabulary | Use a single-line query (implicit `expand:`) or `vec` |
-| Best recall           | `lex` + `vec`                                         |
-| Complex topic         | `lex` + `vec` + `hyde`                                |
-| Ambiguous query       | Add `intent` to any combination above                 |
+`get` and `multi-get` are **line-numbered by default** and always print the
+document's `#docid` and `qmd://` path. So `get` output looks like:
  
-First query gets 2x weight in fusion — put your best guess first.
+```text
+qmd://concepts/note.md  #abc123
+---
  
-### Lex Query Syntax
+1: # Metrics as instruments
+2:
+3: Treat dashboards like cockpit instruments...
+```
  
-| Syntax     | Meaning      | Example                      |
-| ---------- | ------------ | ---------------------------- |
-| `term`     | Prefix match | `perf` matches "performance" |
-| `"phrase"` | Exact phrase | `"rate limiter"`             |
-| `-term`    | Exclude      | `performance -sports`        |
+Cite the docid and exact line numbers in your answer, and use the numbers to ask
+for the next slice. Pass `--no-line-numbers` only when you need raw content to
+copy verbatim (e.g. reproducing a code block).
  
-Note: `-term` only works in lex queries, not vec/hyde.
+When you need to open or edit the underlying file (e.g. hand a path to `Read`,
+`Edit`, or an editor), add `--full-path`. It replaces the `qmd://` URL + docid
+header with the document's on-disk path, falling back to the canonical header if
+the file no longer exists on disk:
  
-### Collection Filtering
+```text
+$ qmd get "#abc123" --full-path
+/Users/you/notes/concepts/note.md
+---
  
-```json
-{ "collections": ["docs"] }              // Single
-{ "collections": ["docs", "notes"] }     // Multiple (OR)
+1: # Metrics as instruments
  ```
  
-Omit to search all collections.
+`--full-path` works the same way on `qmd search` and `qmd query`: result paths
+become the file's on-disk path — `./`-prefixed relative path when the file is
+inside `$PWD`, absolute realpath otherwise — and the per-result `#docid` is
+dropped because the path is the identifier. The leading `./` is intentional so
+the output is unambiguously a filesystem path and cannot be mistaken for a bare
+collection-relative string. Default search/query output still uses `qmd://`
+URIs; only opt into `--full-path` when you specifically need a path you can hand
+to a non-QMD tool.
  
-## Other MCP Tools
+### Read line ranges with the `:from:count` suffix — never pipe through `sed`/`head`/`tail`
  
-| Tool        | Use                              |
-| ----------- | -------------------------------- |
-| `get`       | Retrieve doc by path or `#docid` |
-| `multi_get` | Retrieve multiple by glob/list   |
-| `status`    | Collections and health           |
+`qmd get` slices files itself. Use the suffix or flags; do **not** shell out to
+`sed -n`, `head`, `tail`, or `awk` to pull a line range. Piping defeats docid
+resolution, virtual-path lookups, line numbering, and the header, and it is
+slower and more error-prone.
  
-## CLI
+The most compact form is a `:from:count` suffix right on the path or docid —
+prefer it:
  
  ```bash
-qmd query "question"              # Auto-expand + rerank
-qmd query $'lex: X\nvec: Y'       # Structured
-qmd query $'expand: question'     # Explicit expand
-qmd query --json --explain "q"    # Show score traces (RRF + rerank blend)
-qmd search "keywords"             # BM25 only (no LLM)
-qmd get "#abc123"                 # By docid
-qmd multi-get "journals/2026-*.md" -l 40  # Batch pull snippets by glob
-qmd multi-get notes/foo.md,notes/bar.md   # Comma-separated list, preserves order
+qmd get "#abc123:120:40"                  # 40 lines starting at line 120
+qmd get qmd://concepts/note.md:200:60     # lines 200–259
+qmd get "#abc123:120"                      # from line 120 to end of file
+qmd get "#abc123" --from 120 -l 40         # equivalent, using flags
  ```
  
-## HTTP API
+Suffix and flags:
+
+- `<path>:<from>:<count>` — start at line `<from>`, read `<count>` lines. **Best
+  for reading around a search hit.**
+- `<path>:<from>` — start at `<from>`, read to end of file.
+- `--from <line>` / `-l <lines>` — flag equivalents. Explicit flags override the
+  suffix, so `... :5:2 -l 1` reads 1 line.
+- `--no-line-numbers` — drop the `N:` prefixes (line numbers are on by default).
+
+Wrong: `qmd get "#abc123" | sed -n '120,160p'`
+Right: `qmd get "#abc123:120:40"`
+
+Search results include a `:line` anchor on each hit — feed it straight into
+`qmd get path:line:<n>` to read a window around the match (line numbers in the
+output will start at `line`).
+
+## Discover what is indexed
  
  ```bash
-curl -X POST http://localhost:8181/query \
-  -H "Content-Type: application/json" \
-  -d '{"searches": [{"type": "lex", "query": "test"}]}'
+qmd collection list
+qmd ls
+qmd status
  ```
  
-## Setup
+Add collection filters when broad searches drift into the wrong corpus:
+
+```bash
+qmd search "headcount autonomous agents" -c concepts -n 10
+qmd query "merchant support product reality" -c concepts -c sources -n 10
+```
+
+Omit `-c` to search everything.
+
+## MCP Tool: `query`
+
+When using the MCP server, prefer structured searches:
+
+```json
+{
+  "searches": [
+    { "type": "lex", "query": "cockpit OKR Goodhart" },
+    { "type": "vec", "query": "data informed not metric driven product judgment" },
+    {
+      "type": "hyde",
+      "query": "A concept note explains that metrics are useful as instruments, but leaders should not let OKRs or dashboards replace judgment."
+    }
+  ],
+  "intent": "Find the concept note about using metrics as instruments without becoming metric-driven.",
+  "collections": ["concepts"],
+  "limit": 10
+}
+```
+
+Query types:
+
+- `lex` — BM25 keyword search. Best for exact terms, names, titles, and code.
+- `vec` — vector semantic search. Best for natural-language concepts.
+- `hyde` — vector search using a hypothetical answer/document passage.
+
+## Query craft
+
+Good QMD searches mix three things:
+
+1. **Title/alias anchors:** exact page titles, named entities, phrases.
+2. **Semantic paraphrase:** how a human would describe the idea.
+3. **Negative space:** enough intent to avoid nearby-but-wrong concepts.
+
+Examples:
+
+```bash
+# Exact-ish title lookup
+qmd search '"arm the rebels" merchants tools big companies' -c concepts
+
+# Semantic concept lookup
+qmd query $'intent: Find the customer proximity concept, not generic customer delight.\nlex: support pseudonymous merchant customer interviews\nvec: founder stays close to merchant reality through support and product use'
+
+# Source lookup
+qmd search "six-week cadence WhatsApp merchant relationships Shawn Ryan" -c sources -n 10
+```
+
+## Setup and maintenance
+
+Only mutate indexes when the user asked for setup or maintenance. Searching and
+retrieving are safe; collection/index mutation is not a casual first step.
  
  ```bash
  npm install -g @tobilu/qmd
  qmd collection add ~/notes --name notes
+qmd update
  qmd embed
  ```
+
+Health and diagnostics:
+
+```bash
+qmd doctor
+qmd status
+qmd pull
+```
+
+`qmd doctor` checks config, model cache, device/GPU setup, vector fingerprints,
+and common environment overrides. If a model-backed command fails, run it before
+changing configuration.
+
+## MCP setup
+
+See `references/mcp-setup.md` for Claude Code, Claude Desktop, OpenClaw, and HTTP
+server configuration.
+
+## Pitfalls
+
+- **Do not stop at snippets.** Fetch documents before making claims.
+- **Do not slice files with `sed`/`head`/`tail`.** Use the `path:from:count`
+  suffix (e.g. `qmd get "#abc123:120:40"`) or `--from`/`-l`. Output is already
+  line-numbered; piping breaks docid resolution, the header, and virtual paths.
+- **Do not lean on query expansion.** Write `intent:`/`lex:`/`vec:`/`hyde:`
+  yourself. A bare `qmd query "user sentence"` discards the context only you
+  have. You expand the query; the model just ranks.
+- **Do not overuse semantic search.** If you know exact titles or terms, BM25 is
+  faster and often better.
+- **Do not mutate indexes casually.** `qmd collection add`, `qmd update`, and
+  `qmd embed` change local state and can be expensive.
+- **Model-backed commands can be environment-sensitive.** If `qmd query`,
+  `qmd vsearch`, or reranking fails because local models/GPU are unavailable,
+  use `qmd search` and stronger lexical/structured terms.
+- **Ambiguous user wording needs intent.** Add `intent:` rather than hoping query
+  expansion guesses the right domain.
+- **Collection names matter.** Search `concepts` for synthesized wiki pages,
+  `sources` for transcripts/raw source pages, and docs collections for code or
+  project documentation.
author	Jérôme Benoit <jerome.benoit@sap.com>
	Wed, 10 Jun 2026 22:24:31 +0000 (00:24 +0200)
committer	Jérôme Benoit <jerome.benoit@sap.com>
	Wed, 10 Jun 2026 22:24:31 +0000 (00:24 +0200)