DESCRIPTION: The goals of commonsense reasoning systems include being able to answer commonsense reasoning questions. In order to compare systems, a number of benchmark question sets have arisen. Leaderboards have emerged to act as hubs for hosting benchmarks and supporting infrastructure that accepts submissions of commonsense reasoning systems that then get scored against the benchmarks. These benchmarks vary in structure. Some provide questions and answer choices, while others may provide factual observations and require reasoners to choose the most appropriate hypothesis to explain them.
Recently, there is an increasing effort to incorporate structured knowledge in these systems, largely based on machine-learning techniques, as a way to improve their overall score against benchmarks. In this talk, we will present and discuss our current efforts in supporting this goal, in the context of the Machine Commonsense Project. It includes a Benchmark Ontology, which provides a common vocabulary to allow diverse benchmarks to be compared, integrated, and to support the analysis of systems and machine-learning language models. This talk will discuss its design decisions and showcases how it is currently supporting the development of a Benchmark tool.