Schema

Arize class to organize and map column names containing model data within your Pandas dataframe to Arize.

Import and initialize Arize Schema from arize.utils.types

from arize.utils.types import Schema

class Schema(
    prediction_id_column_name: Optional[str] = None
    feature_column_names: Optional[Union[List[str], TypedColumns]] = None
    tag_column_names: Optional[Union[List[str], TypedColumns]] = None
    timestamp_column_name: Optional[str] = None
    prediction_label_column_name: Optional[str] = None
    prediction_score_column_name: Optional[str] = None
    actual_label_column_name: Optional[str] = None
    actual_score_column_name: Optional[str] = None
    shap_values_column_names: Optional[Dict[str, str]] = None
    embedding_feature_column_names: Optional[Dict[str, EmbeddingColumnNames]] = None  # type:ignore
    prediction_group_id_column_name: Optional[str] = None
    rank_column_name: Optional[str] = None
    attributions_column_name: Optional[str] = None
    relevance_score_column_name: Optional[str] = None
    relevance_labels_column_name: Optional[str] = None
    object_detection_prediction_column_names: Optional[ObjectDetectionColumnNames] = None
    object_detection_actual_column_names: Optional[ObjectDetectionColumnNames] = None
    prompt_column_names: Optional[Union[str, EmbeddingColumnNames]] = None
    response_column_names: Optional[Union[str, EmbeddingColumnNames]] = None
    prompt_template_column_names: Optional[PromptTemplateColumnNames] = None
    llm_config_column_names: Optional[LLMConfigColumnNames] = None
    llm_run_metadata_column_names: Optional[LLMRunMetadataColumnNames] = None
    retrieved_document_ids_column_name: Optional[List[str]] = None
    multi_class_threshold_scores_column_name: Optional[str] = None
)

Parameter Data Type Expected Type In Column Description

Parameter	Data Type	Expected Type In Column	Description
`prediction_id_column_name`	str	Contents must be a string limited to 128 characters	(Optional) A unique string to identify a prediction event. Required to match a prediction to delayed actuals or feature importances in Arize. If the column is not provided, Arize will generate a random prediction id.
`feature_column_names`	List[str] or TypedColumns	Feature values can be int, float, string, list of strings	(Optional) Column names for features. If TypedColumns is used, the columns will be cast to the provided types prior to logging.
`embedding_feature_column_names`	Dict[str, EmbeddingColumnNames]	Learn more here	(Optional) Dictionary mapping embedding display names to `EmbeddingColumnNames` objects
`timestamp_column_name`	str	The content of this column must be int Unix Timestamps in seconds	(Optional) Column name for timestamps
`prediction_label_column_name`	str	The content of this column must be convertible to string	(Optional) Column name for categorical prediction values
`prediction_score_column_name`	str	The content of this column must be int/float. For Multi-Class models, content of this column must be a dictionary, mapping class name to int/float prediction scores.	(Optional Column name for numeric prediction values
`actual_label_column_name`	str	The content of this column must be convertible to string	(Optional) Column name for categorical ground truth values
`actual_score_column_name`	str	The content of this column must be int/float. For Multi-Class models, content of this column must be a dictionary, mapping class name to int/float actual scores.	(Optional) Column name for numeric ground truth
`tag_column_names`	List[str] or TypedColumns	Tag values can be int, float, string. LImited to 1k values	(Optional) Column names for tags. If TypedColumns is used, the columns will be cast to the provided types prior to logging.
`shap_values_column_names`	Dict[str,str]	The content of this column must be int/float	(Optional) dict of k-v pairs where k is the feature_colname and v is the corresponding shap_val_col_name. For example, your dataframe contains features columns`feat1, feat2, feat3,...`and corresponding shap value columns `feat1_shap, feat2_shap, feat3_shap,...` You want to set shap_values_column_names = `{"feat1": "feat1shap", "feat2": "feat2_shap:", "feat3": "feat3_shap"}`
`prediction_group_id_column_name`	str	The content of this column must be string and is limited to 128 characters	(Required) Column name for ranking groups or lists in ranking models for ranking models only
`rank_column_name`	str	The content of this column must be integer between 1-100	(Required) Column name for rank of each element on the its group or list for ranking models only
`relevance_score_column_name`	str	The content of this column must be int/float	(Required) Column name for ranking model type numeric ground truth values for ranking models only
`relevance_labels_column_name`	str	The content of this column must be a string	(Required) Column name for ranking model type categorical ground truth values for ranking models only
`object_detection_prediction_column_names`	ObjectDetectionColumnNames	Learn more here	ObjectDetectionColumnNames object containing information defining the predicted bounding boxes' coordinates, categories, and scores.
`object_detection_actual_column_names`	ObjectDetectionColumnNames	Learn more here	ObjectDetectionColumnNames object containing information defining the actula bounding boxes' coordinates, categories, and scores.
`prompt_column_names`	EmbeddingColumnNames	Learn more here	EmbeddingColumnNames object containing the embedding vector data (required) and raw text (optional) for the input text your model acts on
`response_column_names`	EmbeddingColumnNames	Learn more here	EmbeddingColumnNames object containing the embedding vector data (required) and raw text (optional) for the text your model generates
`prompt_template_column_names`	PromptTemplateColumnNames	Learn more here	PromptTemplateColumnNames object containing the prompt template and prompt template version, both optional
`llm_config_column_names`	LLMConfigColumnNames	Learn more here	LLMConfigColumnNames object containing the LLM model name (optional) and its hyper-parameters (optional) used at inference time
`llm_run_metadata_column_names`	LLMRunMetadataColumnNames	Learn more here	LLMRunMetadata object containing metadata about the LLM inference, i.e., token counts and response latency
`retrieved_document_ids_column_name`	str	The contents of this column must be list of entries convertible to strings	Column name for retrieved document ids
`multi_class_threshold_scores_column_name`	str	Contents of this column must be a dictionary mapping string class names to float scores. Learn more here	(Optional) Column name used only for Multi-Label Multi-Class models and determines the minimum prediction value for a class to be considered a positive prediction.

prediction_id_column_name

str

Contents must be a string limited to 128 characters

(Optional) A unique string to identify a prediction event. Required to match a prediction to delayed actuals or feature importances in Arize. If the column is not provided, Arize will generate a random prediction id.

feature_column_names

List[str] or TypedColumns

Feature values can be int, float, string, list of strings

(Optional) Column names for features. If TypedColumns is used, the columns will be cast to the provided types prior to logging.

embedding_feature_column_names

Dict[str, EmbeddingColumnNames]

Learn more here

(Optional) Dictionary mapping embedding display names to EmbeddingColumnNames objects

timestamp_column_name

str

The content of this column must be int Unix Timestamps in seconds

(Optional) Column name for timestamps

prediction_label_column_name

str

The content of this column must be convertible to string

(Optional) Column name for categorical prediction values

prediction_score_column_name

str

The content of this column must be int/float.

For Multi-Class models, content of this column must be a dictionary, mapping class name to int/float prediction scores.

(Optional Column name for numeric prediction values

actual_label_column_name

str

The content of this column must be convertible to string

(Optional) Column name for categorical ground truth values

actual_score_column_name

str

The content of this column must be int/float.

For Multi-Class models, content of this column must be a dictionary, mapping class name to int/float actual scores.

(Optional) Column name for numeric ground truth

tag_column_names

List[str] or TypedColumns

Tag values can be int, float, string. LImited to 1k values

(Optional) Column names for tags. If TypedColumns is used, the columns will be cast to the provided types prior to logging.

shap_values_column_names

Dict[str,str]

The content of this column must be int/float

(Optional) dict of k-v pairs where k is the feature_colname and v is the corresponding shap_val_col_name. For example, your dataframe contains features columnsfeat1, feat2, feat3,...and corresponding shap value columns feat1_shap, feat2_shap, feat3_shap,... You want to set shap_values_column_names = {"feat1": "feat1shap", "feat2": "feat2_shap:", "feat3": "feat3_shap"}

prediction_group_id_column_name

str

The content of this column must be string and is limited to 128 characters

(Required*) Column name for ranking groups or lists in ranking models *for ranking models only

rank_column_name

str

The content of this column must be integer between 1-100

(Required*) Column name for rank of each element on the its group or list *for ranking models only

relevance_score_column_name

str

The content of this column must be int/float

(Required*) Column name for ranking model type numeric ground truth values *for ranking models only

relevance_labels_column_name

str

The content of this column must be a string

(Required*) Column name for ranking model type categorical ground truth values *for ranking models only

object_detection_prediction_column_names

ObjectDetectionColumnNames

Learn more here

ObjectDetectionColumnNames object containing information defining the predicted bounding boxes' coordinates, categories, and scores.

object_detection_actual_column_names

ObjectDetectionColumnNames

Learn more here

ObjectDetectionColumnNames object containing information defining the actula bounding boxes' coordinates, categories, and scores.

prompt_column_names

EmbeddingColumnNames

Learn more here

EmbeddingColumnNames object containing the embedding vector data (required) and raw text (optional) for the input text your model acts on

response_column_names

EmbeddingColumnNames

Learn more here

EmbeddingColumnNames object containing the embedding vector data (required) and raw text (optional) for the text your model generates

prompt_template_column_names

PromptTemplateColumnNames

Learn more here

PromptTemplateColumnNames object containing the prompt template and prompt template version, both optional

llm_config_column_names

LLMConfigColumnNames

Learn more here

LLMConfigColumnNames object containing the LLM model name (optional) and its hyper-parameters (optional) used at inference time

llm_run_metadata_column_names

LLMRunMetadataColumnNames

Learn more here

LLMRunMetadata object containing metadata about the LLM inference, i.e., token counts and response latency

retrieved_document_ids_column_name

str

The contents of this column must be list of entries convertible to strings

Column name for retrieved document ids

multi_class_threshold_scores_column_name

str

Contents of this column must be a dictionary mapping string class names to float scores.

Learn more here

(Optional) Column name used only for Multi-Label Multi-Class models and determines the minimum prediction value for a class to be considered a positive prediction.

Code Example

prediction id	feature_1	feature_2	tag_1	tag_2	prediction_ts	prediction_label	actual_label	embedding
1fcd50f4689	ca	[ca, ak]	female	25	1637538845	No Claims	No Claims	[1.27346, -0.2138, ...]

prediction id

feature_1

feature_2

tag_1

tag_2

prediction_ts

prediction_label

actual_label

embedding

1fcd50f4689

[ca, ak]

female

1637538845

No Claims

[1.27346, -0.2138, ...]

schema = Schema(
    prediction_id_column_name="prediction id",
    feature_column_names=["feature_1", "feature_2"], 
    tag_column_names=TypedColumns(
        to_str=["tag_1", "tag_2"],
    ),  
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="prediction_label",
    prediction_score_column_name="prediction_score",
    actual_label_column_name="actual_label",
    actual_score_column_name="actual_score",
    shap_values_column_names=shap_values_column_names=dict(zip("feature_1", shap_cols)),
    embedding_feature_column_names=EmbeddingColumnNames(
        vector_column_name="embedding",
    ),
    prediction_group_id_column_name="group_example_name",
    rank_column_name="example_rank",
    relevance_score_column_name="relevance_score",
    relevance_labels_column_name="actual_relevancy",
)

Previouslog NextTypedColumns

Last updated 2 months ago