#azure-aks #azure-databricks #azure-machine-learning-service
Вопрос:
Я попытался развернуть новую модель в записной книжке azure databricks. Этим утром он работал, и теперь у меня следующая ошибка:
После
service.wait_for_deployment(show_output=True)
print(service.state)
print(service.get_logs())
У меня есть:
"message": "Timed out waiting for AKS deployment to complete. pollTimeout : 00:20:00 serviceName: simdev serviceId: ...",
"details": [
{
"code": "DeploymentTimedOut",
"message": "Your container endpoint is not available. Please follow the steps to debug:
1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information.
2. You can also interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
3. View the diagnostic events to check status of container, it may help you to debug the issue.
{"InvolvedObject":"simdev-757df4f999-rbcws","InvolvedKind":"Pod","Type":"Warning","Reason":"FailedScheduling","Message":"0/2 nodes are available: 2 Insufficient nvidia.com/gpu.","LastTimestamp":null}
{"InvolvedObject":"simdev-757df4f999-rbcws","InvolvedKind":"Pod","Type":"Warning","Reason":"FailedScheduling","Message":"0/2 nodes are available: 2 Insufficient nvidia.com/gpu.","LastTimestamp":null}
{"InvolvedObject":"simdev-757df4f999-rbcws","InvolvedKind":"Pod","Type":"Normal","Reason":"Scheduled","Message":"Successfully assigned azureml-train-aml-001-dev/simdev-757df4f999-rbcws to aks-agentpool-34690879-vmss000000","LastTimestamp":null}
Вчера это не сработало. Сегодня утром-да, а теперь-нет.
Вот конфигурация aks:
aks_config = AksWebservice.deploy_configuration(cpu_cores=0.7,
memory_gb=0.7,
gpu_cores=1,
period_seconds=1800,
failure_threshold=10,
timeout_seconds=60,
max_request_wait_time=300000,
scoring_timeout_ms=300000,)
Комментарии:
1. Можете ли вы попробовать перейти в AzureML -> Конечные точки ->> >><Модель-Конечная точка> -<Модель-Конечная точка>> Журналы развертывания и вставить это