GitHub - PragmaticMachineLearning/docai: Structured information extraction from documents

Ensure you have an OPENAI_API_KEY and HF_TOKEN set in your environment variables.

Extraction structured information from the index (open extract.py to see queries and pydantic models):

What losses have occurred in the past 5 years?
LossHistory(
    losses=[
        Loss(loss_date='2/20/21', loss_amount=7003.0, loss_description='Claimant was in his sleeper when his truck got hit by insured driver on the left', date_of_claim='4/19/21'),
        Loss(loss_date='2/4/21', loss_amount=92584.0, loss_description='The IV was attempting to merge on the highway when the IV lost control and struck', date_of_claim='4/30/21'),
        Loss(loss_date='9/14/21', loss_amount=5583.0, loss_description='IV was in the fast lane, when IV tire flew off and struck OV1, OV2, OV3, OV4', date_of_claim='9/15/21'),
        Loss(loss_date='9/14/21', loss_amount=6299.0, loss_description='IV was in the fast lane, when IV tire flew off and struck OV1, OV2, OV3, OV4', date_of_claim='9/15/21')
    ]
)

What is the basic application information?
Application(
  insured_name='Greentown Burgers LLC', 
  insured_address='Not provided', 
  insured_phone='Not provided',
  insured_email='Not provided', 
  effective_date='07/22/2024'
)