524 n8n nodes in one free dataset to stop agent hallucinations

Published May 18, 2026

a computer screen with a bunch of code on it

Building AI agents that write n8n workflows has one specific failure mode: the agent picks a node that looks plausible, invents an operation that does not exist, and produces a workflow that imports cleanly and then breaks at runtime. Hard to catch. Expensive to debug.

Developer Artyom Rabzonov published a structured catalog of every n8n node to HuggingFace to solve exactly that problem. It covers 524 nodes, every operation, every credential type, and a top-level properties schema. Free. CC-BY-4.0.

Table of Contents

️ What Each Row Contains

Every row in the dataset maps to one node and includes:

node_name: internal ID (e.g. slack, lmChatOpenAi)
display_name: the label you see in the UI
categories and subcategories: taxonomy values
operations_supported: the actual operation values, not inferred ones
credentials_required: exact credential type names
properties_schema: JSON describing top-level property descriptors
source_package: either nodes-base or @n8n/nodes-langchain
github_permalink: pinned link to the .node.ts source file

Format is JSON and Parquet (Snappy). The dataset updates monthly.

The Agent Pipeline This Enables

The intended usage pattern is RAG over a tool catalog. Embed every row using description, operations, and credentials. At plan time, retrieve the top N nodes relevant to the user request. Hand the agent only those rows. Validate the emitted workflow JSON against the properties schema before deploy.

The developer includes a code example that filters nodes by operation using the HuggingFace datasets library:

from datasets import load_dataset

ds = load_dataset("automatelab/n8n-nodes-catalog")["train"]

messaging = ds.filter(
    lambda r: "message" in (r["operations_supported"] or [])
)
for row in messaging:
    print(row["node_name"], row["credentials_required"])

Numbers Worth Knowing

The catalog split landed at 431 nodes from nodes-base and 93 from @n8n/nodes-langchain. The langchain package is not a small footnote. A non-trivial number of nodes have an empty operations_supported list: those are root nodes like LLMs, vector stores, and output parsers where the operation abstraction does not apply. Useful to know if your planner filters by operation.

⚠️ Caveats

The properties schema is a top-level summary. For deep parameter shapes, use the github_permalink.
Multi-version nodes report only the default version. The source link covers full version history.
The CC-BY-4.0 license covers the catalog additions. The n8n source itself is governed by n8n’s own license.

A browsable index is also available at automatelab.tech if you want to explore nodes without loading the full dataset.

Cagri Sarigoz( Founder )

I’m founder of BizStack at Cagri Sarigoz LLC and a passionate advocate for entrepreneurs.

With over 14 years in tech, marketing, and AI, including my role as Head of SEO at CitizenShipper and co-founder of TaleBot at Intale AI, I’m dedicated to sharing genuine, useful product insights and tips.

At BizStack, I aim to cut through the digital noise to provide clear, actionable advice.

And more than all else, I’m a father to a (always) little girl and a husband.

Contact me at [email protected] for assistance.

524 n8n nodes in one free dataset to stop agent hallucinations

️ What Each Row Contains

The Agent Pipeline This Enables

Numbers Worth Knowing

⚠️ Caveats

Share on social: