Qwen model generation degenerates into repetitive special tokens

Hi ! Nice Project!
When I run the example_pythin.ipynb with Qwen2_5 7B Base as follows:
```python
import sys
sys.path.append('..')
import os
from dotenv import load_dotenv  # 导入库

# 1. 加载 .env 文件 (默认读取当前目录下的 .env)
load_dotenv()
# Assuming we are in the root directory

from syncode import Syncode
import warnings
warnings.filterwarnings('ignore')

model_name = os.getenv('QWEN2_5_7B_PATH')

# Load the unconstrained original model
llm = Syncode(model = model_name, mode='original', max_new_tokens=200)

# Load the Syncode augmented model
syn_llm = Syncode(
    model = model_name, 
    mode='grammar_mask', 
    grammar='python', 
    parse_output_only=False,
    indent=True,
    opp=False
    )
```

Standard LLM generation is like 
```python
partial_code = "def is_prime(n):\n    '''Return if prime'''\n  "
output = partial_code+llm.infer(partial_code)[0]
print(output)
```

with output 
```
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
def is_prime(n):
    '''Return if prime'''
  
    if n == 1:
        return False
    if n == 2:
        return True
    if n > 2 and n % 2 == 0:
        return False
    max_divisor = int(n**0.5) + 1
    for d in range(3, max_divisor, 2):
        if n % d == 0:
            return False
    return True

def is_palindrome(n):
    '''Return if palindrome'''
    return str(n) == str(n)[::-1]

def is_pandigital(n):
    '''Return if pandigital'''
    return set(str(n)) == set('123456789')[:len(str(n))]

def is_pandigital_0(n):
    '''Return if pandigital'''
    return set(str(n)) == set('0123456789')[:len(str(n))]

def is_pandigital
```


While the code of Syncode is 
```python
output = partial_code+syn_llm.infer(partial_code)[0]
print(output)
```
with output 
```
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
def is_prime(n):
    '''Return if prime'''

```

It seems the grammar constraints are masking all probable next tokens, effectively forcing the model into an early termination. Debugging with skip_special_tokens=False at [link](https://www.google.com/search?q=...&authuser=1) reveals that the output degenerates into repeating special tokens (e.g., <|im_start|>). This may confirm that the grammar constraints are masking all valid continuation tokens, leaving the model with no valid options.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen model generation degenerates into repetitive special tokens #243

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen model generation degenerates into repetitive special tokens #243

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions