(integrations)= # Integrations Flyte is designed to be highly extensible and can be customized in multiple ways. ```{note} Want to contribute an integration example? Check out the {ref}`Tutorials and integration examples contribution guide `. ``` ## Flytekit plugins Flytekit plugins can be implemented purely in Python, unit tested locally, and allow extending Flytekit functionality. For comparison, these plugins can be thought of like [Airflow operators](https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/index.html). ```{list-table} :header-rows: 0 :widths: 20 30 * - {doc}`Comet ` - `comet-ml`: Comet’s machine learning platform. * - {doc}`DBT ` - Run and test your `dbt` pipelines in Flyte. * - {doc}`Dolt ` - Version your SQL database with `dolt`. * - {doc}`DuckDB ` - Run analytical queries using DuckDB. * - {doc}`Great Expectations ` - Validate data with `great_expectations`. * - {doc}`Memray ` - `memray`: Memory profiling with memray. * - {doc}`MLFlow ` - `mlflow`: the open standard for model tracking. * - {doc}`Modin ` - Scale pandas workflows with `modin`. * - {doc}`Neptune ` - `neptune`: Neptune is the MLOps stack component for experiment tracking. * - {doc}`NIM ` - Serve optimized model containers with NIM. * - {doc}`Ollama ` - Serve fine-tuned LLMs with Ollama in a Flyte workflow. * - {doc}`ONNX ` - Convert ML models to ONNX models seamlessly. * - {doc}`Pandera ` - Validate pandas dataframes with `pandera`. * - {doc}`Papermill ` - Execute Jupyter Notebooks with `papermill`. * - {doc}`SQL ` - Execute SQL queries as tasks. * - {doc}`Weights and Biases ` - `wandb`: Machine learning platform to build better models faster. * - {doc}`WhyLogs ` - `whylogs`: the open standard for data logging. ``` :::{dropdown} {fa}`info-circle` Using Flytekit plugins :animate: fade-in-slide-down Data is automatically marshalled and unmarshalled in and out of the plugin. Users should mostly implement the {py:class}`~flytekit.core.base_task.PythonTask` API defined in Flytekit. Flytekit plugins are lazily loaded and can be released independently like libraries. The naming convention is `flytekitplugins-*`, where `*` indicates the package to be integrated into Flytekit. For example, `flytekitplugins-papermill` enables users to author Flytekit tasks using [Papermill](https://papermill.readthedocs.io/en/latest/). You can find the plugins maintained by the core Flyte team [here](https://github.com/flyteorg/flytekit/tree/master/plugins). ::: ## Native backend plugins Native backend plugins can be executed without any external service dependencies because the compute is orchestrated by Flyte itself, within its provisioned Kubernetes clusters. ```{list-table} :header-rows: 0 :widths: 20 30 * - {doc}`Kubeflow PyTorch ` - Run distributed PyTorch training jobs using `Kubeflow`. * - {doc}`Kubeflow TensorFlow ` - Run distributed TensorFlow training jobs using `Kubeflow`. * - {doc}`Kubernetes pods ` - Execute Kubernetes pods for arbitrary workloads. * - {doc}`Kubernetes cluster Dask jobs ` - Run Dask jobs on a Kubernetes Cluster. * - {doc}`Kubernetes cluster Spark jobs ` - Run Spark jobs on a Kubernetes Cluster. * - {doc}`MPI Operator ` - Run distributed deep learning training jobs using Horovod and MPI. * - {doc}`Ray ` - Run Ray jobs on a K8s Cluster. ``` (flyte_agents)= ## Flyte agents [Flyte agents](https://docs.flyte.org/en/latest/flyte_agents/index.html) are long-running, stateless services that receive execution requests via gRPC and initiate jobs with appropriate external or internal services. Each agent service is a Kubernetes deployment that receives gRPC requests from FlytePropeller when users trigger a particular type of task. (For example, the BigQuery agent handles BigQuery tasks.) The agent service then initiates a job with the appropriate service. If you don't see the agent you need below, see "[Developing agents](https://docs.flyte.org/en/latest/flyte_agents/developing_agents.html)" to learn how to develop a new agent. ```{list-table} :header-rows: 0 :widths: 20 30 * - {doc}`AWS SageMaker Inference agent ` - Deploy models and create, as well as trigger inference endpoints on AWS SageMaker. * - {doc}`Airflow agent ` - Run Airflow jobs in your workflows with the Airflow agent. * - {doc}`BigQuery agent ` - Run BigQuery jobs in your workflows with the BigQuery agent. * - {doc}`ChatGPT agent ` - Run ChatGPT jobs in your workflows with the ChatGPT agent. * - {doc}`Databricks agent ` - Run Databricks jobs in your workflows with the Databricks agent. * - {doc}`Memory Machine Cloud agent ` - Execute tasks using the MemVerge Memory Machine Cloud agent. * - {doc}`OpenAI Batch ` - Submit requests for asynchronous batch processing on OpenAI. * - {doc}`PERIAN Job Platform agent ` - Execute tasks on PERIAN Job Platform. * - {doc}`Sensor agent ` - Run sensor jobs in your workflows with the sensor agent. * - {doc}`Slurm agent ` - Run Slurm jobs in your workflows with the Slurm agent. * - {doc}`Snowflake agent ` - Run Snowflake jobs in your workflows with the Snowflake agent. ``` (external_service_backend_plugins)= ## External service backend plugins As the term suggests, these plugins rely on external services to handle the workload defined in the Flyte task that uses the plugin. ```{list-table} :header-rows: 0 :widths: 20 30 * - {doc}`AWS Athena ` - Execute queries using AWS Athena * - {doc}`AWS Batch ` - Running tasks and workflows on AWS batch service * - {doc}`Flyte Interactive ` - Execute tasks using Flyte Interactive to debug. * - {doc}`Hive ` - Run Hive jobs in your workflows. ``` (enable-backend-plugins)= ::::{dropdown} {fa}`info-circle` Enabling backend plugins :animate: fade-in-slide-down To enable a backend plugin, you must add the `ID` of the plugin to the enabled plugins list. The `enabled-plugins` is available under the `tasks > task-plugins` section of FlytePropeller's configuration. The plugin configuration structure is defined [here](https://pkg.go.dev/github.com/flyteorg/flytepropeller@v0.6.1/pkg/controller/nodes/task/config#TaskPluginConfig). An example of the config follows: ```yaml tasks: task-plugins: enabled-plugins: - container - sidecar - k8s-array default-for-task-types: container: container sidecar: sidecar container_array: k8s-array ``` **Finding the `ID` of the backend plugin** To find the `ID` of the backend plugin, look at the source code of the plugin. For examples, in the case of Spark, the value of `ID` is used [here](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L424), defined as [spark](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L41). :::: ## SDKs for writing tasks and workflows The {ref}`community ` would love to help you build new SDKs. Currently, the available SDKs are: ```{list-table} :header-rows: 0 :widths: 20 30 * - [flytekit](https://github.com/flyteorg/flytekit) - The Python SDK for Flyte. * - [flytekit-java](https://github.com/flyteorg/flytekit-java) - The Java/Scala SDK for Flyte. ``` ## Flyte operators Flyte can be integrated with other orchestrators to help you leverage Flyte's constructs natively within other orchestration tools. ```{list-table} :header-rows: 0 :widths: 20 30 * - {doc}`Airflow ` - Trigger Flyte executions from Airflow. ``` ```{toctree} :maxdepth: -1 :hidden: :caption: Flytekit plugins Comet DBT Dolt DuckDB Great Expectations Memray MLFlow Modin Neptune NIM Ollama ONNX Pandera Papermill SQL Weights & Biases WhyLogs ``` ```{toctree} :maxdepth: -1 :hidden: :caption: Native backend plugins Kubeflow PyTorch Kubeflow TensorFlow Kubernetes cluster Dask jobs Kubernetes cluster Spark jobs MPI Operator Ray ``` ```{toctree} :maxdepth: -1 :hidden: :caption: Flyte agents Airflow agent AWS Sagemaker inference agent BigQuery agent ChatGPT agent Databricks agent Memory Machine Cloud agent OpenAI batch agent PERIAN Job Platform agent Sensor agent Slurm agent Snowflake agent ``` ```{toctree} :maxdepth: -1 :hidden: :caption: External service backend plugins AWS Athena AWS Batch Flyte Interactive Hive ``` ```{toctree} :maxdepth: -1 :hidden: :caption: SDKs for writing tasks and workflows flytekit flytekit-java ``` ```{toctree} :maxdepth: -1 :hidden: :caption: Flyte operators Airflow ``` ```{toctree} :maxdepth: -1 :hidden: :caption: Deprecated integrations BigQuery plugin Databricks plugin Kubernetes pods Snowflake plugin ```