Client

class arangoml.main.ArangoML(client: ArangoClient | None = None, hosts: str | List[str] | None = None, username: str | None = None, password: str | None = None, user_token: str | None = None, ca_cert_file: str | None = None, api_endpoint: str | None = None, settings_files: list[str] = ['settings.toml', '.secrets.toml'], version: str | None = None)

The Python Interface for Machine Learning in ArangoDB.

Parameters:
  • client (Optional[arango.client.ArangoClient]) – An instance of arango.client.ArangoClient. Defaults to None. If not provided, the hosts argument must be provided.

  • hosts (Optional[Union[str, List[str]]]) – The ArangoDB hosts. Not used if client is provided.

  • username (Optional[str]) – The ArangoDB user name. If not provided, it will be read from the ARANGODB_USER environment variable. Setting this variable will set the value of the ARANGODB_USER environment variable.

  • password (Optional[str]) – The ArangoDB password. If not provided, it will be read from the environment variable ARANGODB_PW. Setting this variable will set the value of the ARANGODB_PW environment variable.

  • user_token (Optional[str]) – An ArangoDB User JWT Token. Used as an alternative authentication method to username and password. Setting this variable will set the value of the ARANGODB_USER_TOKEN environment variable.

  • ca_cert_file (Optional[str]) – Path to a local CA certificate file needed for TLS (HTTPS) connections to ArangoDB. Passed as the verify_override argument to arango.client.ArangoClient. Not used if client is provided.

  • api_endpoint (Optional[str]) – Optional environment variable to set the ArangoML API endpoint. Defaults to None. Setting this variable will set the value of the ML_API_SERVICES_ENDPOINT environment variable.

  • settings_files (list[str]) – A list of secrets files to be loaded as settings. Parameters provided as arguments will override those in the settings files. Defaults to ["settings.toml", ".secrets.toml"].

  • version (Optional[str]) – The ArangoML API date version. Defaults to None. The current supported versions are: - 2024-03-18: v1

Raises:
  • TypeError – If client is not an instance of arango.client.ArangoClient.

  • ValueError – If username or password are not provided as arguments or environment variables.

property version: str

The ArangoML API version.

Returns:

The ArangoML API version.

Return type:

str

property versions: dict[str, Any]

The supported ArangoML API versions.

Returns:

The supported ArangoML API versions.

Return type:

dict

property projects: ProjectsApiWrapper

The ArangoML Projects API client.

Returns:

The ArangoML Projects API client.

Return type:

arangoml.wrappers.projects.ProjectsApiWrapper

property jobs: JobsApiWrapper

The ArangoML Jobs API client.

Returns:

The ArangoML Jobs API client.

Return type:

arangoml.wrappers.jobs.JobsApiWrapper

property metadata: MetadataApiWrapper

The ArangoML Metadata API client.

Returns:

The ArangoML Metadata API client.

Return type:

arangoml.wrappers.metadata.MetadataApiWrapper

property downstream_tasks: DownstreamTasks

The ArangoML Downstream tasks API client.

Returns:

The ArangoML Downstream Tasks API client.

Return type:

arangoml.wrappers.downstream.DownstreamTasks

set_token(token: str) None

Set the JWT token.

Parameters:

token (str) – JWT token.

Raises:

ValueError – If the token is expired.

set_version(version: str) None

Set the ArangoML API version.

Parameters:

version (str) – The ArangoML API version.

get_or_create_project(project_name: str) BaseModel

Get or create an ArangoML project with the given name.

Parameters:

project_name (str) – The name of the ArangoML project.

Returns:

The project information as a pydantic model.

Return type:

pydanctic.BaseModel

list_models(project_name: str | None = None, training_job_id: str | None = None, model_tasks: List[str] | None = None) List[BaseModel]

List the models of a training job in a project. This can be narrowed by an associated training job.

Parameters:
  • project_name (Optional[str]) – The name of the project.

  • training_job_id (Optional[str]) – The training job ID.

  • model_tasks (list[str] | None) – The model tasks to filter on. Can be “CLASSIFICATION” or “GRAPH_EMBEDDINGS”.

Returns:

The models of the training job.

Return type:

pydanctic.BaseModel

get_best_model(project_name: str, training_job_id: str, sort_parent_key: str = 'test', sort_child_key: str | None = 'accuracy', model_tasks: List[str] | None = None) BaseModel

Get the best model from a training job.

Parameters:
  • project_name (str) – The name of the project.

  • training_job_id (str) – The training job ID.

  • sort_parent_key (str) – Sort on this parent key, defaults to “test”. Other options are “train” and “validation” (Node Classification), or “loss” (Embeddings).

  • sort_child_key (str) – The child key to sort on, defaults to “accuracy”. Other options are “f1”, “precision”, “recall”, “roc_auc” and “pr_auc”. Must be set to None if sort_parent_key is “loss”.

  • model_tasks (list[str] | None) – The model tasks to filter on. Can be “CLASSIFICATION” or “GRAPH_EMBEDDINGS”.

Returns:

The best model according to the given sort keys.

Return type:

pydanctic.BaseModel

wait_for_featurization(featurization_job_id: str, clear_output: bool = True, sleep_interval: int = 3) BaseModel

Wait for an ArangoML Featurization Job to complete.

Parameters:
  • featurization_job_id (str) – The featurization job ID.

  • clear_output (bool) – Whether to clear the output while waiting. Defaults to True.

  • sleep_interval (int) – The interval in seconds to check the job status. Defaults to 3.

Returns:

The featurization job, either completed or failed.

Return type:

pydanctic.BaseModel

wait_for_training(training_job_id: str, clear_output: bool = True, sleep_interval: int = 3) BaseModel

Wait for an ArangoML Training Job to complete.

Parameters:
  • training_job_id (str) – The training job ID.

  • clear_output (bool) – Whether to clear the output while waiting. Defaults to True.

  • sleep_interval (int) – The interval in seconds to check the job status. Defaults to 3.

Returns:

The training job, either completed or failed.

Return type:

pydanctic.BaseModel

wait_for_prediction(prediction_job_id: str, clear_output: bool = True, sleep_interval: int = 3) BaseModel

Wait for an ArangoML Prediction Job to complete.

Parameters:
  • prediction_job_id (str) – The prediction job ID.

  • clear_output (bool) – Whether to clear the output while waiting. Defaults to True.

  • sleep_interval (int) – The interval in seconds to check the job status. Defaults to 3.

Returns:

The prediction job, either completed or failed.

Return type:

pydanctic.BaseModel

wait_for_generation(generation_job_id: str, clear_output: bool = True, sleep_interval: int = 3) BaseModel

Wait for an ArangoML Generation Job to complete.

Parameters:
  • generation_job_id (str) – The generation job ID.

  • clear_output (bool) – Whether to clear the output while waiting. Defaults to True.

  • sleep_interval (int) – The interval in seconds to check the job status. Defaults to 3.

Returns:

The generation job, either completed or failed.

Return type:

pydanctic.BaseModel

class arangoml.wrappers.projects.ProjectsApiWrapper(settings: LazySettings, VERSION: str, SUPPORTED_VERSIONS: dict[str, dict[str, str]])

Wrapper for the Projects API.

property v1: ProjectsApi

Returns the v1 ProjectsApi object.

class arangoml.wrappers.jobs.JobsApiWrapper(settings: LazySettings, VERSION: str, SUPPORTED_VERSIONS: dict[str, dict[str, str]])

Wrapper for the Jobs API.

property v1: JobsApi

Returns the v1 JobsApi object.

class arangoml.wrappers.metadata.MetadataApiWrapper(settings: LazySettings, VERSION: str, SUPPORTED_VERSIONS: dict[str, dict[str, str]])

Wrapper for the Metadata API.

property v1: MetadataApi

Returns the v1 MetadataApi object.

class arangoml.wrappers.downstream.DownstreamTasks(settings: LazySettings, client: ArangoClient)

Wrapper for the Downstream Tasks API.

visualize_embeddings(db_name: str | None = None, collection_name: str | None = None, label_attribute: str | None = None, embeddings_attribute_name: str = 'embeddings', graph_emb: ndarray[Any, dtype[float64]] = None, labels: ndarray[Any, dtype[int64]] = None, label_mapping: Dict[str, int] | None = None, emb_percent: float = 0.1, fig_size: Tuple[int, int] = (8, 8), dpi: int = 120, img_path: str = './umap_embd_visualization.png') None

Performs Dimensionality Reduction (2D) and visualization of Graph Embeddings using U-Map. Stores the visualization as an image.

This method can be used in two ways:

  1. If graph_emb and labels are provided, then the method directly

    visualizes the embeddings.

  2. If db_name, collection_name, and label_attribute are provided

    (optionally embeddings_attribute_name), then the method retrieves the embeddings and labels from the database to visualize them.

Parameters:
  • db_name (Optional[str]) – Target database. Not required if graph_emb is specified.

  • collection_name (Optional[str]) – Target collection. Not required if graph_emb is specified.

  • label_attribute (Optional[str]) – Name of the attribute containing labels. Defaults to db_name + ‘_y’ if not specified.

  • embeddings_attribute_name (str) – Target attribute containing the embeddings. Not required if graph_emb is specified.

  • graph_emb (numpy.typing.NDArray[np.float64]) – Graph embeddings of size (num_nodes, embedding_size).

  • labels (numpy.typing.NDArray[np.int64]) – 1D Numpy array of list of labels.

  • label_mapping (Optional[Dict[int, str]]) – A dictionary where integer labels are mapped to class names (e.g {'Desktops': 0, 'Data Storage': 1, 'Laptops': 2}). A default mapping is applied if this parameter is ommitted. See agml_api.metadata.v1.api.metadata_api.MetadataApi.get_label_mappings()

  • emb_percent (float) – Percentage of embeddings to visualize. Defaults to 0.1.

  • fig_size (Tuple[int, int].) – Set width and height of image.

  • dpi (int) – Set the resolution of the figure in dots-per-inch.

  • img_path (str) – Path to save the image.

Performs Similarity Search on Graph Embeddings using FAISS.

This method can be used in two ways:

  1. If graph_emb is provided, then the method directly performs similarity

    search on the embeddings.

  2. If db_name, collection_name, and embeddings_attribute_name are

    provided, then the method retrieves the embeddings from the database to perform similarity search.

Parameters:
  • db_name (Optional[str]) – Target database. Required if graph_emb is not specified, or if persist_in_db is set to True.

  • collection_name (Optional[str]) – Target collection. Required if graph_emb is not specified, or if persist_in_db is set to True.

  • label_attribute (Optional[str]) – Name of the attribute containing labels.

  • embeddings_attribute_name (str) – Target attribute containing the embeddings. Not required if graph_emb is specified.

  • graph_emb (numpy.typing.NDArray[np.float64]) – Graph embeddings of size (num_nodes, embedding_size). If not specified, then embeddings can automatically be retrieved by specifying db_name, collection_name, and (optionally) ** embeddings_attribute_name**.

  • adb_key_to_ind (Dict[str, int]) – Mapping of ArangoDB document keys to array indices. Array indices are from graph embeddings (graph_emb). Required if graph_emb is specified, and persist_in_db is set to True.

  • ind_to_adb_key (Dict[int, str]) – Mapping of array indices to ArangoDB document keys. Required if graph_emb is specified, and persist_in_db is set to True.

  • top_k_neighbors (int) – Return top-k nearest neighbors of all embeddings.

  • num_clusters (int) – Number of clusters to partition the search space.

  • search_type (str) – Type of search to perform, either exact or approx. Use exact for precise similarity search, but at the cost of scalability. Use approx for scalable similarity search, but at the cost of some precision loss. Defaults to exact.

  • persist_in_db (bool) – If True, persist the search results in the database. If True, db_name, collection_name, and adb_key_to_ind must be specified.

Returns:

Tuple of cosine distances and nearest neighbors (of top k nodes).

Return type:

(ndarray[float32], ndarray[int64])

persist_similarity_results(db_name: str, collection_name: str, cosine_distances: ndarray[Any, dtype[float32]], neighbors: ndarray[Any, dtype[int64]], adb_key_to_ind: Dict[str, int], ind_to_adb_key: Dict[int, str] | None = None, similarity_attr: str = 'similarity', batch_size: int = 1000) None

Persist similarity search results in the database. Invoked by similarity_search() if persist_in_db is set to True.

NOTE: This will overwrite the existing similarity results in the collection. See similarity_attr for the attribute name.

Parameters:
  • db_name (str) – Name of the database to persist search results.

  • collection_name (str) – Name of the collection to persist search results.

  • cosine_distances (numpy.typing.NDArray[np.float32]) – Similarity distances of top k nodes.

  • neighbors (numpy.typing.NDArray[np.int64]) – Top k node ids with highest similarity.

  • adb_key_to_ind (Dict[str, int]) – Mapping of ArangoDB document keys to array indices.

  • ind_to_adb_key (Dict[int, str] | None) – Mapping of array indices to ArangoDB document keys. If omitted, it will be automatically generated.

  • batch_size (int) – Number of documents to update in a single batch.

  • similarity_attr – Attribute name for the similarity results. Defaults to ‘similarity’.

  • similarity_distances_attr (str) – Name of the attribute containing similarity distances. Defaults to ‘similarity_distances’.

get_embeddings(db_name: str, collection_name: str, label_attribute: str | None = None, embeddings_attribute_name: str = 'embeddings') Tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]], Dict[str, int], Dict[int, str]]

Convenience method to get embeddings and label mapping from ArangoDB.

Parameters:
  • db_name (str) – Name of database to get embeddings from.

  • collection_name (str) – Name of collection to get embeddings from.

  • label_attribute (Union[str, None]) – Name of the attribute containing labels. Defaults to db_name + ‘_y’ if not specified.

  • embeddings_attribute_name (str) – Name of the attribute containing embeddings.

Returns:

Tuple of embeddings, labels, a mapping of ArangoDB key to an integer index (for collection_name), and a mapping of integer index to ArangoDB key (for collection_name).

Return type:

(ndarray[float64], ndarray[float64], Dict[str, int], Dict[int, str])