Skip to content

feat(search): wiki +resolve-node shortcut + search methodology skills#346

Open
dingding0418 wants to merge 1 commit intolarksuite:mainfrom
dingding0418:feat/search_harness
Open

feat(search): wiki +resolve-node shortcut + search methodology skills#346
dingding0418 wants to merge 1 commit intolarksuite:mainfrom
dingding0418:feat/search_harness

Conversation

@dingding0418
Copy link
Copy Markdown

@dingding0418 dingding0418 commented Apr 8, 2026

Validated by a 13-case search eval harness (cli-evals): v1 baseline 89/195 (45.6%) → v2 132/195 (67.7%), +48% relative, 8 wins / 0 regressions.

Source code:

  • New shortcut lark-cli wiki +resolve-node --token <url|token> resolving a wiki node to its underlying obj_token + obj_type + title. Replaces the agent workaround lark-cli api GET /open-apis/wiki/v2/spaces/get_node which had high LLM friction (knowing the path, nested JSON params, nested response shape).
  • Auto-extracts token from full wiki URLs.
  • Returns flat output {node_token, obj_token, obj_type, title, space_id}.

Skill files (where most of the eval gain came from):

  • skills/lark-doc/SKILL.md: new "AI Usage Guidance: 多轮关键词改写" section with rewrite matrix, candidate-evaluation order, and best-effort fallback rules for open-ended questions.
  • skills/lark-doc/references/lark-doc-search.md: add multi-round retry guidance, synonym list, "when to stop rewriting" rules.
  • skills/lark-doc/references/lark-doc-fetch.md: add "大文档处理 ⚠️" section documenting docs +fetch 504 limits and the search-summary → --limit → raw blocks API fallback ladder.
  • skills/lark-doc/references/lark-doc-search-recipes.md (new): four-step enterprise knowledge search methodology + synonym dictionary + failure case library + decision tree.
  • skills/lark-wiki/SKILL.md: new "wiki 节点是壳" section + Shortcuts table pointing at the new +resolve-node.
  • skills/lark-wiki/references/lark-wiki-resolve-node.md (new): full reference for the new shortcut with examples and historical context.

Per-case verified wins (13-case eval harness):
case_001: 6 → 9 (+3, completeness 1→4, used wiki +resolve-node)
case_002: 5 → 12 (+7, recall 0→4, multi-keyword rewriting)
case_005: 5 → 13 (+8, best-effort fallback rule)
case_008: 13 → 14 (+1, multi-keyword found supplementary doc)
case_009: 13 → 15 (+2, fetched both expected sources)
case_010: 5 → 9 (+4, split-question rewriting matched 3/6 expected)
case_011: 6 → 13 (+7, "媒体沟通" rewrite hit expected token)
case_013: 0 → 11 (+11, search-summary hit on specific number query)

Unchanged cases reflect tool-capability gaps that skill changes cannot fix and are documented for future work:
case_004: docs +fetch 504 timeout on large docs
case_007: cross-tenant search not supported
case_006: target doc title misaligned with query keywords

Summary

Changes

  • Change 1
  • Change 2

Test Plan

  • Unit tests pass
  • Manual local verification confirms the lark xxx command works as expected

Related Issues

  • None

Summary by CodeRabbit

  • New Features

    • Added wiki +resolve-node command to resolve wiki URLs and tokens into underlying object metadata.
  • Documentation

    • Added comprehensive search methodology guidance covering multi-round keyword rewriting and candidate evaluation.
    • Added large document handling and retrieval strategies with fallback approaches.
    • Added wiki resolution workflow documentation with downstream control flow guidance.

Validated by a 13-case search eval harness (cli-evals): v1 baseline
89/195 (45.6%) → v2 132/195 (67.7%), +48% relative, 8 wins / 0 regressions.

Source code:
- New shortcut `lark-cli wiki +resolve-node --token <url|token>` resolving
  a wiki node to its underlying obj_token + obj_type + title. Replaces the
  agent workaround `lark-cli api GET /open-apis/wiki/v2/spaces/get_node`
  which had high LLM friction (knowing the path, nested JSON params,
  nested response shape).
- Auto-extracts token from full wiki URLs.
- Returns flat output `{node_token, obj_token, obj_type, title, space_id}`.

Skill files (where most of the eval gain came from):
- skills/lark-doc/SKILL.md: new "AI Usage Guidance: 多轮关键词改写" section
  with rewrite matrix, candidate-evaluation order, and best-effort fallback
  rules for open-ended questions.
- skills/lark-doc/references/lark-doc-search.md: add multi-round retry
  guidance, synonym list, "when to stop rewriting" rules.
- skills/lark-doc/references/lark-doc-fetch.md: add "大文档处理 ⚠️"
  section documenting docs +fetch 504 limits and the
  search-summary → --limit → raw blocks API fallback ladder.
- skills/lark-doc/references/lark-doc-search-recipes.md (new): four-step
  enterprise knowledge search methodology + synonym dictionary +
  failure case library + decision tree.
- skills/lark-wiki/SKILL.md: new "wiki 节点是壳" section + Shortcuts table
  pointing at the new +resolve-node.
- skills/lark-wiki/references/lark-wiki-resolve-node.md (new): full
  reference for the new shortcut with examples and historical context.

Per-case verified wins (13-case eval harness):
  case_001: 6 → 9    (+3, completeness 1→4, used wiki +resolve-node)
  case_002: 5 → 12   (+7, recall 0→4, multi-keyword rewriting)
  case_005: 5 → 13   (+8, best-effort fallback rule)
  case_008: 13 → 14  (+1, multi-keyword found supplementary doc)
  case_009: 13 → 15  (+2, fetched both expected sources)
  case_010: 5 → 9    (+4, split-question rewriting matched 3/6 expected)
  case_011: 6 → 13   (+7, "媒体沟通" rewrite hit expected token)
  case_013: 0 → 11   (+11, search-summary hit on specific number query)

Unchanged cases reflect tool-capability gaps that skill changes cannot
fix and are documented for future work:
  case_004: docs +fetch 504 timeout on large docs
  case_007: cross-tenant search not supported
  case_006: target doc title misaligned with query keywords

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added domain/ccm PR touches the ccm domain size/L Large or sensitive change across domains or core paths labels Apr 8, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

This PR introduces a new wiki +resolve-node shortcut that resolves wiki wrapper node URLs or tokens into underlying object metadata (obj_token, obj_type, title, etc.), registers it in the shortcut system, and adds comprehensive documentation covering wiki resolution workflows and document search methodologies.

Changes

Cohort / File(s) Summary
Wiki shortcut implementation
shortcuts/register.go, shortcuts/wiki/shortcuts.go, shortcuts/wiki/wiki_resolve_node.go
Registers wiki shortcut package in the aggregation system. Implements WikiResolveNode shortcut that normalizes wiki tokens/URLs, validates --token parameter, calls /open-apis/wiki/v2/spaces/get_node API, handles responses, and outputs flattened node metadata (node_token, obj_token, obj_type, title, space_id, etc.) in tabular format.
Wiki documentation
skills/lark-wiki/SKILL.md, skills/lark-wiki/references/lark-wiki-resolve-node.md
Documents wiki +resolve-node usage, clarifying that wiki URLs are wrappers around underlying objects. Defines standard resolution flow, documents command syntax with output formats and parameters, provides downstream branching guidance based on obj_type, records historical failure modes, and enumerates required scopes.
Document search guidance
skills/lark-doc/SKILL.md, skills/lark-doc/references/lark-doc-search.md, skills/lark-doc/references/lark-doc-search-recipes.md, skills/lark-doc/references/lark-doc-fetch.md
Expands AI usage guidance for docs +search workflows with multi-round keyword rewriting strategies (2–3 rounds), candidate evaluation criteria (title keywords, scenario match, ownership/recency), handling of large documents via pagination and raw blocks API fallback, and wiki integration using +resolve-node for wrapper documents. Includes anti-patterns and decision trees for search termination.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as lark-cli
    participant API as Lark API
    
    User->>CLI: wiki +resolve-node --token <URL/token>
    activate CLI
    CLI->>CLI: Extract & normalize token<br/>(trim, extract from /wiki/*, strip query)
    
    alt Dry-run mode
        CLI->>CLI: Configure GET request
        Note over CLI: /open-apis/wiki/v2/spaces/get_node<br/>token=<normalized>, obj_type="wiki"
        CLI-->>User: Display dry-run request
    else Execute mode
        CLI->>API: GET /open-apis/wiki/v2/spaces/get_node
        activate API
        API-->>CLI: Return {node: {...}} or error
        deactivate API
        
        alt Node exists
            CLI->>CLI: Flatten node fields<br/>(obj_token, obj_type, title, etc.)
            CLI->>CLI: Format as table
            CLI-->>User: Display resolved metadata
        else Node missing
            CLI-->>User: Return formatted API error<br/>(raw + normalized tokens)
        end
    end
    deactivate CLI
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • larksuite/cli#101 — Implements a similar new domain shortcut (minutes +download) with parallel registration pattern in shortcuts/register.go and shortcut package structure.

Suggested labels

documentation, size/L

Suggested reviewers

  • liangshuo-1
  • fangshuyu-768
  • zhouyue-bytedance

Poem

🐰 A wiki node unwrapped with care,
Token parsed through the morning air,
Nested objects now laid bare,
Resolve-node hops—data everywhere! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description is extensive and comprehensive, covering motivation, source code changes, skill file updates, and per-case verification results. However, it does not follow the required template structure with clearly marked Summary, Changes, Test Plan, and Related Issues sections. Reorganize the description to match the template: move eval metrics to Summary, list file changes under Changes section, and explicitly check or document test completion status and any related issues.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: a new wiki +resolve-node shortcut and search methodology enhancements, directly reflecting the core code and documentation additions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 8, 2026

Greptile Summary

This PR adds a lark-cli wiki +resolve-node shortcut that resolves wiki node URLs/tokens to their underlying obj_token + obj_type, replacing a verbose raw-API workaround. It also ships several skill/reference documents that add multi-round keyword rewriting guidance, large-document fallback strategies, and a four-step enterprise search methodology — which together account for the reported eval gain.

Confidence Score: 5/5

Safe to merge; all findings are P2 style/best-practice suggestions with no correctness impact.

The core logic (token extraction, API call, response flattening) is correct and follows existing shortcut patterns. All three inline comments are P2: deprecated API usage, a semantically odd error code on a defensive nil-check, and missing unit tests. None of these block the feature from working correctly in production.

shortcuts/wiki/wiki_resolve_node.go — minor style improvements recommended but not blocking.

Vulnerabilities

No security concerns identified. The shortcut is read-only (Risk: \"read\", wiki:wiki:readonly scope), performs no mutation, and the token extraction logic uses a fixed regex without any external input being executed or interpolated into shell commands.

Important Files Changed

Filename Overview
shortcuts/wiki/wiki_resolve_node.go New shortcut implementing wiki node resolution; logic is correct but uses deprecated CallAPI, passes code 0 to ErrAPI on unexpected nil, and lacks unit tests for extractWikiToken.
shortcuts/wiki/shortcuts.go Thin aggregator returning the single WikiResolveNode shortcut; straightforward and correct.
shortcuts/register.go Adds wiki import and appends wiki.Shortcuts() to allShortcuts; follows existing registration pattern exactly.
skills/lark-wiki/references/lark-wiki-resolve-node.md Comprehensive reference for the new shortcut including usage examples, output schema, downstream decision tree, and historical context; no issues found.
skills/lark-doc/references/lark-doc-search-recipes.md New file with a four-step enterprise search methodology, synonym dictionary, failure case library, and decision tree; documentation-only addition with no code concerns.
skills/lark-wiki/SKILL.md Adds 'wiki 节点是壳' section and a Shortcuts table pointing to +resolve-node; content is accurate and well-structured.
skills/lark-doc/SKILL.md Adds multi-round keyword rewrite matrix and best-effort fallback rules for open-ended queries; documentation-only, no issues.
skills/lark-doc/references/lark-doc-fetch.md Adds 504 timeout handling guidance and a search-summary → --limit → raw blocks API fallback ladder for large documents; documentation-only, no issues.
skills/lark-doc/references/lark-doc-search.md Adds multi-round retry guidance, synonym list, and 'when to stop rewriting' rules; documentation-only, no issues.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as lark-cli wiki +resolve-node
    participant Extract as extractWikiToken()
    participant LarkAPI as Lark API /wiki/v2/spaces/get_node
    participant NextStep as Next shortcut (docs/base/sheets)

    User->>CLI: --token URL or bare node token
    CLI->>Extract: raw input string
    Extract-->>CLI: normalized node token
    CLI->>LarkAPI: GET get_node with token and obj_type=wiki
    LarkAPI-->>CLI: response with node object containing obj_token and obj_type
    CLI-->>User: flat JSON with node_token, obj_token, obj_type, title, space_id
    User->>NextStep: use obj_token and obj_type to call docs +fetch or base or sheets +read
Loading

Reviews (1): Last reviewed commit: "feat(search): wiki +resolve-node shortcu..." | Re-trigger Greptile

Comment on lines +107 to +109
node, _ := data["node"].(map[string]interface{})
if node == nil {
return output.ErrAPI(0, "wiki node not found or not accessible (input="+rawInput+", normalized="+token+")", nil)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Semantically misleading error code 0 on unexpected-nil guard

output.ErrAPI(0, ...) passes Lark success code 0 to ClassifyLarkError, which doesn't match any known error case and falls through to a generic api_error. Because 0 is the standard Lark "OK" code, the intent (unexpected missing node field in an otherwise-successful response) would be clearer with output.Errorf:

Suggested change
node, _ := data["node"].(map[string]interface{})
if node == nil {
return output.ErrAPI(0, "wiki node not found or not accessible (input="+rawInput+", normalized="+token+")", nil)
return output.Errorf(output.ExitAPI, "api_error", "wiki node not found or not accessible (input=%s, normalized=%s)", rawInput, token)

Comment on lines +22 to +38
var wikiURLPattern = regexp.MustCompile(`/wiki/([A-Za-z0-9]+)`)

// extractWikiToken returns the bare wiki token from either a URL or a token string.
// If the input doesn't look like a URL, it's assumed to already be a token.
func extractWikiToken(input string) string {
input = strings.TrimSpace(input)
if input == "" {
return ""
}
if matches := wikiURLPattern.FindStringSubmatch(input); len(matches) > 1 {
return matches[1]
}
// Strip any trailing query string or fragment if present
if idx := strings.IndexAny(input, "?#"); idx >= 0 {
input = input[:idx]
}
return input
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No unit tests for extractWikiToken

extractWikiToken has three distinct branches (URL match, query-string strip, plain token passthrough) and is called from both Validate and Execute. Every other shortcut package that has non-trivial helpers includes a *_test.go; this package has none. Edge cases worth covering: scheme-less URL (bytedance.larkoffice.com/wiki/TOKEN), URL with query string, bare token with ? fragment, and empty input.

Comment on lines +94 to +102
data, err := runtime.CallAPI(
"GET",
"/open-apis/wiki/v2/spaces/get_node",
map[string]interface{}{
"token": token,
"obj_type": "wiki",
},
nil,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Prefer DoAPIJSON over the deprecated CallAPI for new shortcuts

runner.go documents: "Prefer DoAPI for new code — it calls the Lark SDK directly and supports file upload/download options." DoAPIJSON provides the same JSON-envelope unwrapping as CallAPI but goes through the SDK path. Since this is a net-new shortcut it's a good opportunity to follow the recommended pattern:

data, err := runtime.DoAPIJSON("GET", "/open-apis/wiki/v2/spaces/get_node",
    larkcore.QueryParams{
        "token":    []string{token},
        "obj_type": []string{"wiki"},
    }, nil)
if err != nil {
    return err
}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
skills/lark-wiki/references/lark-wiki-resolve-node.md (1)

76-99: Consider noting the case statement is bash-specific.

The downstream flow example uses bash case syntax. While line 101 clarifies LLM agents don't need scripts, users referencing this doc for shell automation might benefit from knowing this is bash-specific (won't work in sh or zsh without modification for some edge cases).

📝 Suggested clarification
 ## 标准下游流程
 
+> 以下是 bash 脚本示例,供自动化参考。
+
 ```bash
 # 第一步:解析
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/lark-wiki/references/lark-wiki-resolve-node.md` around lines 76 - 99,
The example uses bash-specific syntax (see RESULT=$(lark-cli wiki +resolve-node
...), OBJ_TYPE and the case "$OBJ_TYPE" in construct) but the doc doesn't state
the shell; update the markdown to explicitly label the code block as bash and
add one brief sentence above the block: "This example is written for bash;
behavior may differ in sh/zsh—adjust quoting/command substitutions accordingly."
Ensure the code fence is annotated as bash (```bash) and keep the existing
example unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@shortcuts/wiki/wiki_resolve_node.go`:
- Around line 107-110: The current silent type assertion for data["node"] can
hide malformed responses; change the logic in the function handling data (the
code around the node variable in wiki_resolve_node.go) to first check whether
data contains "node" (v, ok := data["node"]) and return ErrAPI indicating the
node is missing when !ok or v == nil, and if present perform a type assertion to
map[string]interface{} (nodeMap, ok := v.(map[string]interface{})); when that
assertion fails return a different ErrAPI that reports an unexpected type
(include fmt.Sprintf("%T", v) or other diagnostic info) rather than claiming the
node was not found, so callers get a clear diagnostic for malformed API
responses.

---

Nitpick comments:
In `@skills/lark-wiki/references/lark-wiki-resolve-node.md`:
- Around line 76-99: The example uses bash-specific syntax (see
RESULT=$(lark-cli wiki +resolve-node ...), OBJ_TYPE and the case "$OBJ_TYPE" in
construct) but the doc doesn't state the shell; update the markdown to
explicitly label the code block as bash and add one brief sentence above the
block: "This example is written for bash; behavior may differ in sh/zsh—adjust
quoting/command substitutions accordingly." Ensure the code fence is annotated
as bash (```bash) and keep the existing example unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e98d0b54-3e2f-44e0-a7b2-5aba30cba6f1

📥 Commits

Reviewing files that changed from the base of the PR and between db9ca5c and c166956.

📒 Files selected for processing (9)
  • shortcuts/register.go
  • shortcuts/wiki/shortcuts.go
  • shortcuts/wiki/wiki_resolve_node.go
  • skills/lark-doc/SKILL.md
  • skills/lark-doc/references/lark-doc-fetch.md
  • skills/lark-doc/references/lark-doc-search-recipes.md
  • skills/lark-doc/references/lark-doc-search.md
  • skills/lark-wiki/SKILL.md
  • skills/lark-wiki/references/lark-wiki-resolve-node.md

Comment on lines +107 to +110
node, _ := data["node"].(map[string]interface{})
if node == nil {
return output.ErrAPI(0, "wiki node not found or not accessible (input="+rawInput+", normalized="+token+")", nil)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Type assertion failure produces a misleading error message.

If data["node"] exists but is not a map[string]interface{} (e.g., malformed API response), the silent type assertion sets node to nil, and the error message claims "wiki node not found" when the actual issue is an unexpected response format. Consider distinguishing these cases.

🔧 Suggested improvement for clearer diagnostics
-		node, _ := data["node"].(map[string]interface{})
-		if node == nil {
-			return output.ErrAPI(0, "wiki node not found or not accessible (input="+rawInput+", normalized="+token+")", nil)
+		nodeRaw, exists := data["node"]
+		if !exists || nodeRaw == nil {
+			return output.ErrAPI(0, "wiki node not found or not accessible (input="+rawInput+", normalized="+token+")", nil)
+		}
+		node, ok := nodeRaw.(map[string]interface{})
+		if !ok {
+			return output.ErrAPI(0, "unexpected API response format for wiki node", data)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/wiki/wiki_resolve_node.go` around lines 107 - 110, The current
silent type assertion for data["node"] can hide malformed responses; change the
logic in the function handling data (the code around the node variable in
wiki_resolve_node.go) to first check whether data contains "node" (v, ok :=
data["node"]) and return ErrAPI indicating the node is missing when !ok or v ==
nil, and if present perform a type assertion to map[string]interface{} (nodeMap,
ok := v.(map[string]interface{})); when that assertion fails return a different
ErrAPI that reports an unexpected type (include fmt.Sprintf("%T", v) or other
diagnostic info) rather than claiming the node was not found, so callers get a
clear diagnostic for malformed API responses.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/L Large or sensitive change across domains or core paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants