524 n8n nodes in one free dataset to stop agent hallucinations

a computer screen with a bunch of code on it

Building AI agents that write n8n workflows has one specific failure mode: the agent picks a node that looks plausible, invents an operation that does not exist, and produces a workflow that imports cleanly and then breaks at runtime. Hard to catch. Expensive to debug.

Developer Artyom Rabzonov published a structured catalog of every n8n node to HuggingFace to solve exactly that problem. It covers 524 nodes, every operation, every credential type, and a top-level properties schema. Free. CC-BY-4.0.

️ What Each Row Contains

Every row in the dataset maps to one node and includes:

  • node_name: internal ID (e.g. slack, lmChatOpenAi)
  • display_name: the label you see in the UI
  • categories and subcategories: taxonomy values
  • operations_supported: the actual operation values, not inferred ones
  • credentials_required: exact credential type names
  • properties_schema: JSON describing top-level property descriptors
  • source_package: either nodes-base or @n8n/nodes-langchain
  • github_permalink: pinned link to the .node.ts source file

Format is JSON and Parquet (Snappy). The dataset updates monthly.

The Agent Pipeline This Enables

The intended usage pattern is RAG over a tool catalog. Embed every row using description, operations, and credentials. At plan time, retrieve the top N nodes relevant to the user request. Hand the agent only those rows. Validate the emitted workflow JSON against the properties schema before deploy.

The developer includes a code example that filters nodes by operation using the HuggingFace datasets library:

from datasets import load_dataset

ds = load_dataset("automatelab/n8n-nodes-catalog")["train"]

messaging = ds.filter(
    lambda r: "message" in (r["operations_supported"] or [])
)
for row in messaging:
    print(row["node_name"], row["credentials_required"])

Numbers Worth Knowing

The catalog split landed at 431 nodes from nodes-base and 93 from @n8n/nodes-langchain. The langchain package is not a small footnote. A non-trivial number of nodes have an empty operations_supported list: those are root nodes like LLMs, vector stores, and output parsers where the operation abstraction does not apply. Useful to know if your planner filters by operation.

⚠️ Caveats

  • The properties schema is a top-level summary. For deep parameter shapes, use the github_permalink.
  • Multi-version nodes report only the default version. The source link covers full version history.
  • The CC-BY-4.0 license covers the catalog additions. The n8n source itself is governed by n8n’s own license.

A browsable index is also available at automatelab.tech if you want to explore nodes without loading the full dataset.

Stay on top of AI & Automation with BizStack Newsletter
BizStack  —  Entrepreneur’s Business Stack
Logo