FileNotFoundError: [Errno 2] No such file or directory: 'serve'

Hello,

we are trying to deploy vLLM using the aws-neuron container on AWS sagemaker. On startup, the script fails on `ENTRYPOINT`:

```
ENTRYPOINT ["python", "/usr/local/bin/vllm_entrypoint.py"]
```
I can see that the file is copied on line [107](https://github.com/aws-neuron/deep-learning-containers/blob/2.26.1/vllm/inference/0.9.1/Dockerfile.neuronx#L107) it seems that we are getting an error related to this entrypoint. 

The exact error we get from logs on startup is the following:

```
Traceback (most recent call last):
  File "/usr/local/bin/vllm_entrypoint.py", line 4, in <module>
    subprocess.check_call(sys.argv[1:])
  File "/opt/conda/lib/python3.11/subprocess.py", line 408, in check_call
    retcode = call(*popenargs, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/subprocess.py", line 389, in call
    with Popen(*popenargs, **kwargs) as p:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/opt/conda/lib/python3.11/subprocess.py", line 1955, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
```


For your information we are using the following to create the endpoint:

Sagemaker model:
```yaml
custom_image: "public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:0.9.1-neuronx-py311-sdk2.26.1-ubuntu22.04"
      mode: "SingleModel"
      model_data_url: "<custom_model_data_vllm>"
      environment:
      - name: "SM_VLLM_MAX_MODEL_LEN"
        value: "12000"
      - name: "SM_VLLM_LIMIT_MM_PER_PROMPT"
        value: '{"image":6, "video":0}'
      - name: "SM_VLLM_MODEL"
        value: "/opt/ml/model/qwen2_W4A16"
      - name: "SM_VLLM_MM_PROCESSOR_CACHE_GB"
        value: "0"
      - name: "SM_VLLM_NO_ENABLE_PREFIX_CACHING"
        value: "true"
      - name: "SM_VLLM_ADDITIONAL_CONFIG"
        value: "{\"override_neuron_config\":{\"enable_bucketing\":false}}"
```

Sagemaker endpoint configuration
```yaml
  instance_type: "ml.inf2.8xlarge"
  routing_config:
    routing_strategy: "LEAST_OUTSTANDING_REQUESTS"
```





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileNotFoundError: [Errno 2] No such file or directory: 'serve' #171

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FileNotFoundError: [Errno 2] No such file or directory: 'serve' #171

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions