Before deep learning, AI was symbolic. Programs manipulated explicit rules, logical formulas, and structured knowledge representations. That era produced systems that could prove theorems and diagnose diseases — but could not recognize a cat in a photograph. The neural revolution solved perception but introduced new problems: hallucination, opacity, and the absence of formal guarantees. Neurosymbolic AI attempts to combine the strengths of both paradigms.
Symbolic AI — also called Good Old-Fashioned AI (GOFAI) — represents knowledge as symbols and manipulates them according to explicit rules. The core assumption: intelligence is symbol manipulation, and a sufficiently complete set of rules can capture any domain.
Expert systems were the most commercially successful symbolic AI. A knowledge base of if-then rules encoded domain expertise — MYCIN (1976) used roughly 600 rules to diagnose bacterial infections and recommend antibiotics, matching the performance of infectious disease specialists. XCON (1980) configured VAX computer orders at Digital Equipment Corporation, saving the company an estimated $40 million per year.
Logic programming (Prolog, Datalog) expressed knowledge as logical facts and rules. A query engine derived answers through logical inference — backward chaining from the goal to the facts. This approach worked well for closed domains where the relevant knowledge could be enumerated.
Symbolic systems required human experts to manually encode knowledge as rules. This process was slow, expensive, and error-prone. Experts often could not articulate their decision-making process — a radiologist “just sees” the tumor, but cannot express that recognition as a chain of if-then rules. As domains grew more complex, the number of rules exploded, interactions between rules became unpredictable, and maintaining consistency became intractable. The CYC project, begun in 1984 with the goal of encoding all common-sense knowledge, has been running for over 40 years and remains incomplete.
Symbolic systems worked in narrow, well-defined domains but collapsed when facing the messiness of the real world. Three fundamental problems:
Brittleness. A symbolic system knows only what it has been told. A medical expert system trained on 600 rules cannot handle a case that falls outside those rules — it does not degrade gracefully, it simply fails. There is no notion of similarity or approximation; a rule either matches or it does not.
The frame problem. When an action occurs, which facts change and which remain the same? Pouring coffee into a cup changes the cup’s contents but not its color, weight (approximately), or location. In a symbolic system, what does not change must be specified explicitly — an impossible task for open-ended domains.
Perception. Symbolic AI could not connect to raw sensory data. Converting an image into symbols — “there is a red circle to the left of a blue square” — required solving computer vision first. The symbolic reasoning operated on the output of perception, but perception was the hard part.
Humans operate with a vast background of commonsense knowledge that is almost never stated explicitly. “If a ball is placed on a table and the table is then tilted, the ball will roll off.” No one teaches this rule — it emerges from embodied experience with physics. Symbolic systems have no physics, no embodiment, and no experience. Every piece of commonsense knowledge must be manually encoded, and the number of such facts is effectively infinite. This is distinct from the knowledge acquisition bottleneck: the problem is not extracting knowledge from experts, but encoding the knowledge that every five-year-old possesses and no expert can enumerate.
Deep learning solved perception. Convolutional networks matched human performance on image classification. Transformers achieved human-level language understanding. These systems learn from data rather than rules, generalize to unseen inputs, and handle noisy, ambiguous real-world signals.
But neural networks introduced their own failure modes:
Hallucination. LLMs generate fluent, confident text that is factually wrong. There is no mechanism to distinguish known facts from plausible-sounding fabrication. The model has no concept of truth — it has learned statistical patterns over text.
No formal guarantees. A neural network cannot prove that its answer is correct. It cannot explain its reasoning in terms of logical steps that can be independently verified. For applications requiring correctness guarantees — medical diagnosis, legal reasoning, financial compliance — this is disqualifying.
Opaque reasoning. The path from input to output passes through billions of parameters with no human-readable intermediate representation. When the model is wrong, there is no way to inspect the reasoning chain and identify the faulty step.
For a medical AI recommending treatment, “95% accurate” is not good enough if there is no way to tell which 5% are wrong. Symbolic systems at least fail transparently — the rule did not match, the fact was missing. Neural systems fail silently — the model produces a confident, well-formatted answer that happens to be incorrect. The user has no signal to distinguish correct from incorrect outputs without independent verification. This reliability gap motivates the search for neurosymbolic approaches that combine neural perception with symbolic verification.
Neurosymbolic AI combines neural networks (perception, pattern recognition, learning from data) with symbolic systems (logical reasoning, formal verification, structured knowledge). The integration can happen at multiple levels:
Neural perception, symbolic reasoning. The neural network processes raw input (images, text, audio) and extracts structured representations. The symbolic system reasons over those representations using formal rules. A visual question-answering system might use a vision model to identify objects and their spatial relationships, then use a logic engine to answer “Is the red ball to the left of every blue cube?”
Symbolic constraints on neural outputs. The neural network generates candidate outputs, and symbolic rules filter or correct them. A code-generation model produces candidate programs, and a type checker or verifier rejects those that are syntactically or semantically invalid.
Neural-guided symbolic search. The symbolic system defines a search space (possible proofs, possible programs, possible plans), and the neural network learns a heuristic to guide the search efficiently. This is the approach used in AlphaProof.
At one extreme, the neural and symbolic components are completely separate modules connected by a clean interface. At the other extreme, symbolic operations are differentiable and embedded within the neural network’s computation graph, allowing end-to-end training. Differentiable logic, neural theorem provers, and graph neural networks operating over knowledge graphs represent points along this spectrum. The tighter the integration, the more the system can learn to improve its symbolic reasoning from data — but the harder it is to maintain the formal guarantees that motivated the symbolic component.
A knowledge graph represents facts as (subject, predicate, object) triples: (Paris, capital-of, France), (Einstein, born-in, Ulm). Large knowledge graphs like Wikidata contain billions of such facts in a structured, queryable format.
Combining knowledge graphs with LLMs addresses hallucination directly. Instead of relying on the model’s parametric memory (which may be wrong), the system queries a knowledge graph for verified facts and incorporates them into the response. The LLM provides natural language understanding and generation; the knowledge graph provides factual grounding.
Practical approaches include: retrieving relevant subgraphs during generation, using the knowledge graph to verify claims in the model’s output, and fine-tuning models to produce structured queries against the graph rather than generating facts from memory.
Knowledge graphs are incomplete — they contain only facts that have been explicitly added. They struggle with nuance, context-dependence, and temporal change. “Paris is the capital of France” is static, but “the best restaurant in Paris” depends on criteria, timing, and taste. Knowledge graphs also require ongoing curation — facts become outdated, new entities appear, and relationships change. The combination of LLMs and knowledge graphs works best for domains with stable, well-defined facts (geography, taxonomy, corporate structure) and less well for subjective, rapidly changing, or context-dependent information.
Program synthesis — automatically generating programs from specifications — is a natural application of neurosymbolic methods. The neural component learns patterns from large code corpora (what programs typically look like, common idioms, likely solutions). The symbolic component verifies that the generated program meets its specification (type correctness, passing test cases, formal proofs of properties).
LLM-based code generation (GitHub Copilot, Claude’s code generation) is a form of neural program synthesis. Adding symbolic verification strengthens it: generating multiple candidate programs and keeping only those that pass a verifier eliminates a large class of errors that the neural model alone would produce.
DeepMind’s AlphaProof (2024) combined a language model with the Lean formal theorem prover to solve International Mathematical Olympiad problems. The language model proposed proof strategies and intermediate steps; the Lean verifier checked each step for logical correctness. This is neurosymbolic integration at its most concrete: the neural network handles the creative, intuitive aspect of mathematical reasoning (what approach might work?), while the formal system handles the rigorous aspect (is this step logically valid?). AlphaProof solved problems at a level approaching silver-medalist performance, demonstrating that neurosymbolic systems can achieve results that neither component could reach alone.
This lesson establishes: