Building AI agents that write n8n workflows has one specific failure mode: the agent picks a node that looks plausible, invents an operation that does not exist, and produces a workflow that imports cleanly and then breaks at runtime. Hard to catch. Expensive to debug.
Developer Artyom Rabzonov published a structured catalog of every n8n node to HuggingFace to solve exactly that problem. It covers 524 nodes, every operation, every credential type, and a top-level properties schema. Free. CC-BY-4.0.
️ What Each Row Contains
Every row in the dataset maps to one node and includes:
node_name: internal ID (e.g.slack,lmChatOpenAi)display_name: the label you see in the UIcategoriesandsubcategories: taxonomy valuesoperations_supported: the actual operation values, not inferred onescredentials_required: exact credential type namesproperties_schema: JSON describing top-level property descriptorssource_package: eithernodes-baseor@n8n/nodes-langchaingithub_permalink: pinned link to the.node.tssource file
Format is JSON and Parquet (Snappy). The dataset updates monthly.
The Agent Pipeline This Enables
The intended usage pattern is RAG over a tool catalog. Embed every row using description, operations, and credentials. At plan time, retrieve the top N nodes relevant to the user request. Hand the agent only those rows. Validate the emitted workflow JSON against the properties schema before deploy.
The developer includes a code example that filters nodes by operation using the HuggingFace datasets library:
from datasets import load_dataset
ds = load_dataset("automatelab/n8n-nodes-catalog")["train"]
messaging = ds.filter(
lambda r: "message" in (r["operations_supported"] or [])
)
for row in messaging:
print(row["node_name"], row["credentials_required"])Numbers Worth Knowing
The catalog split landed at 431 nodes from nodes-base and 93 from @n8n/nodes-langchain. The langchain package is not a small footnote. A non-trivial number of nodes have an empty operations_supported list: those are root nodes like LLMs, vector stores, and output parsers where the operation abstraction does not apply. Useful to know if your planner filters by operation.
⚠️ Caveats
- The properties schema is a top-level summary. For deep parameter shapes, use the
github_permalink. - Multi-version nodes report only the default version. The source link covers full version history.
- The CC-BY-4.0 license covers the catalog additions. The n8n source itself is governed by n8n’s own license.
A browsable index is also available at automatelab.tech if you want to explore nodes without loading the full dataset.
