Skip to content

DataModel

The DataModel class is a base class for defining data models in the languru package. It provides class methods for creating instances of the data model from content generated by OpenAI's language model, validating the generated data against the model's schema, and extracting the desired information.

Introduction

The DataModel class is built on top of the BaseModel class from the pydantic library, which allows for easy definition and validation of data models. It leverages the power of OpenAI's language model to generate structured data based on a given content and a predefined schema.

The main purpose of the DataModel class is to provide a convenient way to extract relevant information from unstructured text using OpenAI's language model and transform it into a structured format defined by the data model.

Usage

To use the DataModel class, you need to create a subclass that defines the desired fields and their types. Here's an example:

from languru.prompts.repositories.data_model import DataModel

class Person(DataModel):
    name: str
    age: int
    email: str

Once you have defined your data model, you can use the provided class methods to create instances of the model from content generated by OpenAI's language model.

models_from_openai method

The models_from_openai class method allows you to create multiple instances of the data model from the given content. It takes the following parameters:

  • content: The unstructured text content from which to extract the data.
  • client: An instance of the OpenAI client for making API calls.
  • model: The name of the OpenAI language model to use (default: "gpt-3.5-turbo").
  • **kwargs: Additional keyword arguments to pass to the OpenAI API.

The method returns a list of instances of the data model, extracted from the generated content.

Example usage:

from openai import OpenAI
from languru.prompts.repositories.data_model import DataModel

class Person(DataModel):
    name: str
    age: int
    email: str

content = "John Doe is 30 years old. His email is john@example.com."
client = OpenAI(api_key="your_api_key")

people = Person.models_from_openai(content, client)
for person in people:
    print(person.name, person.age, person.email)

model_from_openai method

The model_from_openai class method is similar to models_from_openai, but it returns a single instance of the data model. If multiple instances are extracted from the content, only the first one is returned.

Example usage:

person = Person.model_from_openai(content, client)
print(person.name, person.age, person.email)

Validation

The DataModel class automatically validates the generated data against the defined schema using the pydantic library. If the validation fails, a ValidationError is raised, indicating that the generated data does not match the expected format.

Error Handling

The DataModel class provides error handling for the following scenarios:

  • If no JSON code block is found in the generated content, a ValueError is raised with an appropriate error message.
  • If the generated data fails to validate against the defined schema, a ValidationError is raised, indicating the validation errors.

Customization

You can customize the behavior of the DataModel class by overriding its methods in your subclass. For example, you can modify the model_json_schema method to provide a custom schema for your data model.

Additionally, you can extend the functionality of the DataModel class by adding your own methods and properties to your subclass.

Conclusion

The DataModel class provides a powerful and flexible way to extract structured data from unstructured text using OpenAI's language model. By defining your data models as subclasses of DataModel, you can easily create instances of the models from generated content and validate the data against the defined schema.