Deploy Ml Model Serving
Skill AktivDeploy machine learning models to production serving infrastructure using MLflow, BentoML, or Seldon Core with REST/gRPC endpoints, implement autoscaling, monitoring, and A/B testing capabilities for high-performance model inference at scale. Use when deploying trained models for real-time inference, setting up REST or gRPC prediction APIs, implementing autoscaling for variable load, running A/B tests between model versions, or migrating from batch to real-time inference.
To enable users to deploy and manage machine learning models in production environments with robust, scalable, and observable serving infrastructure.
Funktionen
- Deploy ML models with MLflow, BentoML, Seldon Core
- Implement REST/gRPC endpoints for real-time inference
- Configure autoscaling for variable load
- Set up monitoring and observability with Prometheus/Grafana
- Implement A/B testing and canary deployments
Anwendungsfälle
- Deploying trained models for real-time inference
- Setting up prediction APIs
- Implementing autoscaling for fluctuating demand
- Running A/B tests between model versions
- Migrating from batch to real-time inference
Nicht-Ziele
- Training or fine-tuning ML models
- Managing MLflow tracking server or Kubernetes cluster infrastructure
- Directly interacting with cloud provider deployment services (e.g., SageMaker, Vertex AI)
- Performing deep model performance analysis or drift detection (beyond basic monitoring)
Documentation
- info:Configuration & parameter referenceWhile the SKILL.md provides good procedural steps and examples, it does not explicitly document all configuration parameters, defaults, or precedence orders for external tools like MLflow, BentoML, or Kubernetes.
Portability
- warning:Structural AssumptionThe example Kubernetes YAMLs and Dockerfiles assume a certain project structure and environment setup (e.g., 'your-registry/churn-classifier:v1.0', 'http://mlflow-server:5000') that might require significant adaptation by the user.
- warning:Runtime stabilityThe skill relies heavily on specific tooling (MLflow, BentoML, Seldon Core, Kubernetes) and assumes their presence and proper configuration, which may lead to instability if the user's environment differs significantly.
Errors
- info:Actionable error messagesWhile the SKILL.md outlines potential failures and recovery steps for specific deployment stages, it does not provide universally actionable error messages for all potential runtime issues within the deployed infrastructure.
Execution
- warning:Pinned dependenciesThe SKILL.md mentions dependencies for MLflow, BentoML, and Seldon Core, and Dockerfiles include `pip install` commands, but specific versions are not always pinned, and lockfiles for Python dependencies are not explicitly referenced, which could lead to dependency conflicts.
Practical Utility
- warning:Edge casesThe SKILL.md lists common pitfalls and provides some recovery steps, but it does not systematically document all potential failure modes (e.g., dependency conflicts, network issues, specific K8s errors) with clear symptoms and recovery paths.
Safety
- info:Halt on unexpected stateThe skill outlines potential failure points and recovery steps, but it does not explicitly mandate halting the workflow or reporting on unexpected pre-state in a machine-readable checklist format.
Installation
/plugin install agent-almanac@pjt222-agent-almanacQualitätspunktzahl
Vertrauenssignale
Ähnliche Erweiterungen
Orchestrate Ml Pipeline
99Orchestrate end-to-end machine learning pipelines using Prefect or Airflow with DAG construction, task dependencies, retry logic, scheduling, monitoring, and integration with MLflow, DVC, and feature stores for production ML workflows. Use when automating multi-step ML workflows from data ingestion to deployment, scheduling periodic model retraining, coordinating distributed training tasks, or managing retry logic and failure recovery across pipeline stages.
Monitor Model Drift
99Implement comprehensive model drift monitoring using Evidently AI, statistical tests (PSI, KS), and custom metrics to detect data drift and concept drift in production ML systems. Set up automated alerting and reporting workflows to catch degradation before it impacts business metrics. Use when production models show unexplained performance degradation, when new data distributions differ from training data, when seasonal shifts affect input features, or when regulatory requirements mandate model monitoring.
K8s Manifest Generator
100Create production-ready Kubernetes manifests for Deployments, Services, ConfigMaps, and Secrets following best practices and security standards. Use when generating Kubernetes YAML manifests, creating K8s resources, or implementing production-grade Kubernetes configurations.
Hf Cli
100Hugging Face Hub CLI (`hf`) for downloading, uploading, and managing models, datasets, spaces, buckets, repos, papers, jobs, and more on the Hugging Face Hub. Use when: handling authentication; managing local cache; managing Hugging Face Buckets; running or scheduling jobs on Hugging Face infrastructure; managing Hugging Face repos; discussions and pull requests; browsing models, datasets and spaces; reading, searching, or browsing academic papers; managing collections; querying datasets; configuring spaces; setting up webhooks; or deploying and managing HF Inference Endpoints. Make sure to use this skill whenever the user mentions 'hf', 'huggingface', 'Hugging Face', 'huggingface-cli', or 'hugging face cli', or wants to do anything related to the Hugging Face ecosystem and to AI and ML in general. Also use for cloud storage needs like training checkpoints, data pipelines, or agent traces. Use even if the user doesn't explicitly ask for a CLI command. Replaces the deprecated `huggingface-cli`.
Arize Experiment
100Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy.
Arize Evaluator
100Handles LLM-as-judge evaluation workflows on Arize including creating/updating evaluators, running evaluations on spans or experiments, managing tasks, trigger-run operations, column mapping, and continuous monitoring. Use when the user mentions create evaluator, LLM judge, hallucination, faithfulness, correctness, relevance, run eval, score spans, score experiment, trigger-run, column mapping, continuous monitoring, or improve evaluator prompt.