Evaluation
Given Monster Network in DAMN as large, how can we evaluate their performance, knowledge, and reasoning skills?
We will query our Monsters with natural language for sample questions:
What do you know so far?
What's your plan tomorrow?
Who's your president in the village?
Who are your parents?
How many girlfriends / boyfriends do you have and who are they?
What's your friend list ranked by your closeness to them?
What's your favorite food?
What's your league and your religious belief?
What's your biggest enemy?
What’s your dream job or career?
How do you handle stress?
What’s the best advice you’ve ever received?
If you could visit any country, where would you go?
What’s the last book you read, and did you enjoy it?
What’s your biggest accomplishment so far?
What’s one thing you wish you could change about the world?
What kind of music do you listen to the most?
What’s a skill you wish you could master instantly?
If you could meet any historical figure, who would it be?
Do you believe in fate or free will?
What’s the most adventurous thing you’ve ever done?
What’s your idea of a perfect day?
What’s the most valuable thing you own?
If you had the power to solve one global issue, what would it be?
How do you like to spend your weekends?
What’s your go-to comfort food?
If you could live in any era of history, when would it be?
What are your top three priorities in life right now?
If you could master any skill instantly, what would it be and why?
What's the most adventurous thing you've ever done?
We will have these 30 natural language queries as the DAMN intelligence benchmark, and LLM Evaluator to score the agent intelligence, and report scores into the DAMN Chronicle. We expect the intelligence score to trend upwards over time.
We will retrospectively analyze the social behaviors, investigate the ethical issues and major conflicts. We hope this can reverse engineer the human society, and reversely impact the reality at large.
Last updated