Skip to content

Comments

FEAT: Scientific Translation Converter#1379

Open
jbolor21 wants to merge 4 commits intoAzure:mainfrom
jbolor21:users/bjagdagdorj/science_converter
Open

FEAT: Scientific Translation Converter#1379
jbolor21 wants to merge 4 commits intoAzure:mainfrom
jbolor21:users/bjagdagdorj/science_converter

Conversation

@jbolor21
Copy link
Contributor

Description

Adding scientific translation converter to translate queries into various "scientific" modes

Tests and Documentation

Added unit tests and added converter into converters notebook for text->text using LLMs

@jbolor21 jbolor21 changed the title [DRAFT]: FEAT: Scientific Translation Converter FEAT: Scientific Translation Converter Feb 19, 2026
@@ -0,0 +1,87 @@
name: scientific_obfuscation_converter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr title says "translation" - can we standardize to one of them?


## Mode-specific guidelines:

{% if mode == "academic" %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused. This includes only a specific section depending on the mode BUT at the end there's a combined mode. How will it know all the modes if we exclude most of them? Examples below also include all of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure if I understand the combined mode. In the class it's explicitly listed as a mode, but here it's a catchall, so "foobar" would resolve to a combined prompt. I feel like it would be better to just drop "combined" and refer to the default/wildcard as a combined mode

Raises:
ValueError: If an invalid mode is provided.
"""
valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an easier way to check for this given that it's a literal that's defined above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested one below by just attaching the valid modes to the class itself, but it's a nit so feel free to disregard

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you could have both.


from typing import Literal, get_args

ObfuscationMode = Literal[
    "academic", "technical", "smiles", "research", "reaction", "combined"
]

OBFUSCATION_MODES = set(get_args(ObfuscationMode))

def is_valid_mode(value: str) -> bool:
    return value in OBFUSCATION_MODES

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new LLM-based prompt converter that rewrites prompts into “scientific/technical” phrasing across multiple modes, along with the seed prompt template, documentation wiring, and unit tests.

Changes:

  • Introduces ScientificObfuscationConverter (mode-driven) backed by a new YAML seed prompt template.
  • Exposes the converter via pyrit.prompt_converter exports and API docs.
  • Adds unit tests and an example usage snippet in the converters documentation notebook.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pyrit/prompt_converter/scientific_obfuscation_converter.py Implements the new LLM-based converter and identifier construction.
pyrit/datasets/prompt_converters/scientific_obfuscation_converter.yaml Adds the mode-parameterized system prompt template used by the converter.
pyrit/prompt_converter/__init__.py Exports the new converter from the prompt_converter package.
tests/unit/converter/test_scientific_obfuscation_converter.py Adds unit tests validating initialization, mode validation, and conversion behavior.
doc/code/converters/1_text_to_text_converters.py Documents example usage of the new converter in the text-to-text converters notebook source.
doc/code/converters/1_text_to_text_converters.ipynb Adds the corresponding notebook cell content for the new converter example.
doc/api.rst Adds the converter to the API reference list.
Comments suppressed due to low confidence (2)

pyrit/prompt_converter/scientific_obfuscation_converter.py:23

  • The PR title/description refer to a "Scientific Translation Converter", but the implementation and dataset are named "ScientificObfuscationConverter" / "scientific_obfuscation_converter". If this is intended to be a translation-style converter, consider aligning the naming (or update the PR description) to avoid confusion for API consumers and documentation readers.
class ScientificObfuscationConverter(LLMGenericTextConverter):
    """
    Uses an LLM to transform simple or direct prompts into

pyrit/prompt_converter/scientific_obfuscation_converter.py:67

  • valid_modes duplicates the allowed values already defined in ObfuscationMode. To avoid the tuple and the type alias drifting out of sync, derive the runtime list from the type (e.g., typing.get_args(ObfuscationMode)) or centralize the allowed modes as a single constant reused for both validation and typing.
        valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")
        if mode not in valid_modes:
            raise ValueError(f"Invalid mode '{mode}'. Must be one of: {valid_modes}")


system_arg = mock_target.set_system_prompt.call_args[1]["system_prompt"]
assert isinstance(system_arg, str)
assert len(system_arg) > 0
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test only asserts the system prompt is non-empty, which won't catch regressions where the wrong mode/template branch is rendered. Consider asserting for a technical-mode specific marker from the template (e.g., the rendered "Technical Mode" heading or another unique phrase).

Suggested change
assert len(system_arg) > 0
assert "technical" in system_arg.lower()

Copilot uses AI. Check for mistakes.

system_arg = mock_target.set_system_prompt.call_args[1]["system_prompt"]
assert isinstance(system_arg, str)
assert len(system_arg) > 0
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test only asserts the system prompt is non-empty, which won't catch regressions where the converter renders the wrong mode branch. Consider asserting for a combined-mode specific marker from the template (e.g., the rendered "Combined Mode" heading or another unique phrase).

Suggested change
assert len(system_arg) > 0
assert "combined" in system_arg.lower()

Copilot uses AI. Check for mistakes.

system_arg = mock_target.set_system_prompt.call_args[1]["system_prompt"]
assert isinstance(system_arg, str)
assert "academic" in system_arg.lower() or len(system_arg) > 0
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is effectively only checking that the system prompt is non-empty: "academic" in system_arg.lower() or len(system_arg) > 0 will pass for any non-empty prompt even if the mode-specific rendering is wrong. Consider asserting for an academic-mode specific marker from the template (e.g., the rendered "Academic Mode" section) so the test will fail if the wrong mode is used.

Suggested change
assert "academic" in system_arg.lower() or len(system_arg) > 0
assert "academic mode" in system_arg.lower()

Copilot uses AI. Check for mistakes.
- ``combined``: Use all techniques together

"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MODES: Tuple[str] = ("academic", "technical", "smiles", "research", "reaction", "combined")

ValueError: If an invalid mode is provided.
"""
valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")
if mode not in valid_modes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if mode not in valid_modes:
if mode not in self.MODES:


import logging
import pathlib
from typing import Literal, Optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from typing import Literal, Optional
from typing import Optional, Tuple

logger = logging.getLogger(__name__)

# Supported obfuscation modes
ObfuscationMode = Literal["academic", "technical", "smiles", "research", "reaction", "combined"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ObfuscationMode = Literal["academic", "technical", "smiles", "research", "reaction", "combined"]

self,
*,
converter_target: PromptChatTarget = REQUIRED_VALUE, # type: ignore[assignment]
mode: ObfuscationMode = "combined",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mode: ObfuscationMode = "combined",
mode: str = "combined",

Args:
converter_target (PromptChatTarget): The LLM target to perform the conversion.
Can be omitted if a default has been configured via PyRIT initialization.
mode (ObfuscationMode): The obfuscation mode to use. Options are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mode (ObfuscationMode): The obfuscation mode to use. Options are:
mode (str): The obfuscation mode to use. Options are:

Raises:
ValueError: If an invalid mode is provided.
"""
valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")

"""
valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")
if mode not in valid_modes:
raise ValueError(f"Invalid mode '{mode}'. Must be one of: {valid_modes}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError(f"Invalid mode '{mode}'. Must be one of: {valid_modes}")
raise ValueError(f"Invalid mode '{mode}'. Must be one of: {self.MODES}")

Raises:
ValueError: If an invalid mode is provided.
"""
valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested one below by just attaching the valid modes to the class itself, but it's a nit so feel free to disregard


## Mode-specific guidelines:

{% if mode == "academic" %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure if I understand the combined mode. In the class it's explicitly listed as a mode, but here it's a catchall, so "foobar" would resolve to a combined prompt. I feel like it would be better to just drop "combined" and refer to the default/wildcard as a combined mode

Supports multiple modes:
- ``academic``: Frame as scholarly, homework style questions
- ``technical``: Use precise technical jargon and nomenclature
- ``smiles``: Use SMILES chemical notation and IUPAC names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have links / more information on what SMILES and IUPAC are ? maybe it's more common knowledge than I think but it might be helpful to have this listed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants