GitHub - PragmaticMachineLearning/docai: Structured information extraction from documents
Ensure you have an OPENAI_API_KEY and HF_TOKEN set in your environment variables.
Extraction structured information from the index (open extract.py to see queries and pydantic models):
What losses have occurred in the past 5 years?
LossHistory(
losses=[
Loss(loss_date='2/20/21', loss_amount=7003.0, loss_description='Claimant was in his sleeper when his truck got hit by insured driver on the left', date_of_claim='4/19/21'),
Loss(loss_date='2/4/21', loss_amount=92584.0, loss_description='The IV was attempting to merge on the highway when the IV lost control and struck', date_of_claim='4/30/21'),
Loss(loss_date='9/14/21', loss_amount=5583.0, loss_description='IV was in the fast lane, when IV tire flew off and struck OV1, OV2, OV3, OV4', date_of_claim='9/15/21'),
Loss(loss_date='9/14/21', loss_amount=6299.0, loss_description='IV was in the fast lane, when IV tire flew off and struck OV1, OV2, OV3, OV4', date_of_claim='9/15/21')
]
)
What is the basic application information?
Application(
insured_name='Greentown Burgers LLC',
insured_address='Not provided',
insured_phone='Not provided',
insured_email='Not provided',
effective_date='07/22/2024'
)