Hello,
we are trying to deploy vLLM using the aws-neuron container on AWS sagemaker. On startup, the script fails on ENTRYPOINT:
ENTRYPOINT ["python", "/usr/local/bin/vllm_entrypoint.py"]
I can see that the file is copied on line 107 it seems that we are getting an error related to this entrypoint.
The exact error we get from logs on startup is the following:
Traceback (most recent call last):
File "/usr/local/bin/vllm_entrypoint.py", line 4, in <module>
subprocess.check_call(sys.argv[1:])
File "/opt/conda/lib/python3.11/subprocess.py", line 408, in check_call
retcode = call(*popenargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/subprocess.py", line 389, in call
with Popen(*popenargs, **kwargs) as p:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/subprocess.py", line 1026, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/opt/conda/lib/python3.11/subprocess.py", line 1955, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
For your information we are using the following to create the endpoint:
Sagemaker model:
custom_image: "public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:0.9.1-neuronx-py311-sdk2.26.1-ubuntu22.04"
mode: "SingleModel"
model_data_url: "<custom_model_data_vllm>"
environment:
- name: "SM_VLLM_MAX_MODEL_LEN"
value: "12000"
- name: "SM_VLLM_LIMIT_MM_PER_PROMPT"
value: '{"image":6, "video":0}'
- name: "SM_VLLM_MODEL"
value: "/opt/ml/model/qwen2_W4A16"
- name: "SM_VLLM_MM_PROCESSOR_CACHE_GB"
value: "0"
- name: "SM_VLLM_NO_ENABLE_PREFIX_CACHING"
value: "true"
- name: "SM_VLLM_ADDITIONAL_CONFIG"
value: "{\"override_neuron_config\":{\"enable_bucketing\":false}}"
Sagemaker endpoint configuration
instance_type: "ml.inf2.8xlarge"
routing_config:
routing_strategy: "LEAST_OUTSTANDING_REQUESTS"
Hello,
we are trying to deploy vLLM using the aws-neuron container on AWS sagemaker. On startup, the script fails on
ENTRYPOINT:I can see that the file is copied on line 107 it seems that we are getting an error related to this entrypoint.
The exact error we get from logs on startup is the following:
For your information we are using the following to create the endpoint:
Sagemaker model:
Sagemaker endpoint configuration