Add data extension prompts, templates, and barrier/barrierGuard support by felickz · Pull Request #42 · advanced-security/codeql-development-template

felickz · 2026-04-21T17:12:07Z

Summary

Add comprehensive CodeQL data extension (Models as Data) development guidance as Copilot prompts, issue template, and PR template.

Sample MAD's created

HTTP4k for Kotlin: [Data Extension Create]: Kotlin HTTP4k testing-felickz/codeql-development-template#25

see usage

vulns

Java Apache Camel: [Data Extension Create]: Java Apache Camel testing-felickz/codeql-development-template#22

see usage

vulns

Databricks: ttps://github.com/[Extension Create]: Databricks testing-felickz/codeql-development-template#5
- vulns found
Undertow: [Extension Create]: Undertow testing-felickz/codeql-development-template#3
- vulns found

What's included

10 new files:

File	Description
`.github/prompts/data_extensions_development.prompt.md`	Common guidance: core principles, threat models, model formats (API Graph vs MaD), CLI references
`.github/prompts/cpp_data_extension_development.prompt.md`	C/C++ MaD format (9-10 col tuples), pointer indirection (`Argument[*n]`), namespace-based identification
`.github/prompts/csharp_data_extension_development.prompt.md`	C# MaD format, fully qualified signatures, property getter/setter naming (`get_`/`set_`)
`.github/prompts/go_data_extension_development.prompt.md`	Go MaD format, package versioning, package grouping, `Argument[receiver]`
`.github/prompts/java_data_extension_development.prompt.md`	Java/Kotlin MaD format, subtypes flag, VS Code model editor reference
`.github/prompts/javascript_data_extension_development.prompt.md`	JS/TS API Graph format (3-5 col tuples), `Fuzzy`, `GuardedRouteHandler`, `typeModel`
`.github/prompts/python_data_extension_development.prompt.md`	Python API Graph format, API graph verification queries, `builtins` type
`.github/prompts/ruby_data_extension_development.prompt.md`	Ruby API Graph format, `Method[]` access paths, `!` suffix for class references
`.github/ISSUE_TEMPLATE/data-extension-create.yml`	GitHub issue template for requesting new data extensions
`.github/PULL_REQUEST_TEMPLATE/data-extension-create.md`	PR template for data extension contributions

Barrier and Barrier Guard support (CodeQL 2.25.2+)

All prompts include the new barrierModel (sanitizers) and barrierGuardModel (validators) extensible predicates announced in the April 21, 2026 changelog:

barrierModel: Stops taint flow at the modeled element for a specified query kind (e.g., HTML-escaping prevents XSS)
barrierGuardModel: Stops taint flow when a conditional check returns an expected boolean value (e.g., URL validation prevents open redirects)

Each language prompt includes barrier/barrier guard examples from the official CodeQL docs:

C++: mysql_real_escape_string (SQL injection barrier), is_safe (barrier guard)
C#: HttpRequest.RawUrl (URL redirection barrier), Uri.IsAbsoluteUri (barrier guard)
Go: beego Htmlquote (HTML injection barrier), IsSafe (barrier guard)
Java: File.getName() (path injection barrier), URI.isAbsolute() (request forgery barrier guard)
JavaScript: encodeURIComponent (HTML injection barrier), isValid (barrier guard)
Python: html.escape (HTML injection barrier), Django url_has_allowed_host_and_scheme (barrier guard)
Ruby: Mysql2::Client#escape (SQL injection barrier), Validator.is_safe (barrier guard)

References

CodeQL now supports sanitizers and validators in models-as-data
Language docs: C++ | C# | Go | Java | JS | Python | Ruby

Add comprehensive CodeQL data extension development guidance: - Common prompt with core principles, threat models, and CLI references - Language-specific prompts for C++, C#, Go, Java/Kotlin, JS/TS, Python, Ruby - Issue template and PR template for data extension workflow - barrierModel (sanitizers) and barrierGuardModel (validators) support across all languages (CodeQL 2.25.2+)

data-douser

Great work — dedicated, language-specific models-as-data guidance with barrier/barrierGuard coverage aligned to CodeQL 2.25.2 is exactly what this repo needs. The YAML examples, API Graph vs MaD format documentation, and real-world samples (HTTP4k, Apache Camel, Databricks, Undertow) are excellent.

Key concern: Several places in the prompts and templates use language that implies the goal is to write a new CodeQL query (.ql file), when the primary value of models-as-data is that you only need simple YAML. This framing risks misleading LLMs — especially Copilot Cloud Agent — into scaffolding QL code when they should be creating/updating .model.yml files and/or publishing model packs.

The three primary use cases that need better coverage:

Creating a new .model.yml for an unmodeled library (partially covered; needs an end-to-end procedural workflow including both the repo-level .github/codeql/extensions/ path and the model pack path)
Updating an existing .model.yml — adding new sinks/sources/barriers to an already-modeled library (not covered at all)
Publishing a model pack to GHCR for org-wide Default Setup (referenced but not walked through as a workflow; see org-level model packs and extending coverage for all repos in an org)

Opened #44 to track adding .github/skills/{create,publish}-model-pack/ agent skills as a follow-up to provide the procedural workflows for these use cases.

See inline comments for specifics and typo fixes.

data-douser · 2026-04-21T21:01:50Z

@@ -0,0 +1,143 @@
+name: Request new CodeQL Data Exension
+description: Request a new CodeQL query for detecting specific code patterns
+title: "[Data Extension Create]: "


This description says "Request a new CodeQL query" — but the whole point of data extensions is that you don't write a new CodeQL query. An LLM (especially Copilot Cloud Agent) reading this will anchor on "new CodeQL query" and may attempt to scaffold a .ql file instead of a .model.yml file.

Suggest:

description: Request a new CodeQL data extension (models-as-data) for an unmodeled library or framework

data-douser · 2026-04-21T21:01:51Z

@@ -0,0 +1,143 @@
+name: Request new CodeQL Data Exension


Typo: "Exension" → "Extension"

data-douser · 2026-04-21T21:01:53Z

+      description: Which programming language should this query target?
+      options:
+        - actions
+        - cpp


The actions language is listed as an option, but there's no corresponding actions_data_extension_development.prompt.md in this PR. If Actions doesn't support models-as-data, remove it from this dropdown to avoid confusing agents. If it does, it needs a prompt file.

data-douser · 2026-04-21T21:02:17Z

+This prompt provides common guidance for developing CodeQL data extensions across all supported languages, while language-specific prompts reference this common guidance and add language-specific details.
+
+## Product Documentation
+


The prompts are heavily oriented toward creating a brand new model from scratch, but the most common real-world workflows aren't well represented. Consider adding a "## Common Workflows" section covering:

Creating a new .model.yml file — end-to-end: identify library → create YAML → test with --additional-packs → validate results

Updating an existing .model.yml file — adding rows to an already-modeled library (where to find existing models, how to add without breaking, re-testing)

Publishing updates to an existing model pack — versioning, codeql pack publish, and configuring the pack for Default Setup across an org

These three use cases are the primary value proposition of models-as-data, and an agent needs explicit procedural guidance for each.

data-douser · 2026-04-21T21:02:17Z

+#### Default behavior
+
+By default, only the **`remote`** threat model is enabled. This means only sources marked with `kind: "remote"` are active. To include local sources, you must explicitly enable additional threat models via `--threat-model` on the CLI or in the code scanning configuration.
+


Typos: "organizaiton" → "organization", and in the Development section further down, "easilly" → "easily".

data-douser · 2026-04-21T21:02:17Z

+
+### Threat Models
+
+Threat models control which `sourceModel` entries are active during analysis. The `kind` column of a `sourceModel` determines its threat model category.


This header says "Query Quality Criteria" but the body talks about models/extensions. Should be "Model Quality Criteria" or "Extension Quality Criteria" to avoid reinforcing query-writing framing for agents.

data-douser · 2026-04-21T21:02:18Z

+## Product Documentation
+
+- [Extending coverage for a repository](https://docs.github.com/en/code-security/how-tos/scan-code-for-vulnerabilities/manage-your-configuration/editing-your-configuration-of-default-setup#extending-coverage-for-a-repository) - `.github/codeql/extensions directory` for local model pack refrences (does not need a qlpack.yml)
+- [Extending coverage for all repositories in an organization](https://docs.github.com/en/code-security/how-tos/scan-code-for-vulnerabilities/manage-your-configuration/editing-your-configuration-of-default-setup#extending-coverage-for-all-repositories-in-an-organization) - publishing model packs and referencing them globally (must be done click button in UI)


Typo: "refrences" → "references"

data-douser · 2026-04-21T21:02:18Z

+Model packs can be used to expand code scanning analysis at scale. Model packs use data extensions, which are implemented as YAML and describe how to add data for new dependencies. When a model pack is specified, the data extensions in that pack will be added to the code scanning analysis automatically.
+
+Generally each language will allow customization of the following extensible prdicates:
+


Typo: "prdicates" → "predicates"

data-douser · 2026-04-21T21:02:30Z

+
+For general CodeQL data extension model development guidance, see [Common Data Extension Development](./data_extensions_development.prompt.md).
+For general CodeQL query development guidance, see [Common Query Development](./query_development.prompt.md).
+


The cross-reference to query_development.prompt.md is prominently placed as the second line of every language prompt. For a data extension task, the agent should not need query development guidance — and this framing may cause an LLM to treat QL query writing as part of the expected workflow.

Consider moving this to the bottom under "Additional References" (where it already appears), or qualifying it: "If you need to write a custom CodeQL query instead of a data extension, see..." — making it clear data extensions are the primary path and QL queries are a fallback.

(Same feedback applies to all seven language-specific prompts.)

data-douser · 2026-04-21T21:02:31Z

+### Python Documentation
+
+- [Customizing Library Models for Python](https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-python/)
+  - Can also be found at [Customizing Library Models for Python Docs](https://github.com/github/codeql/blob/main/docs/codeql/codeql-language-guides/customizing-library-models-for-python.rst)


Typo: "acess" → "access"

felickz requested review from a team, data-douser and enyil as code owners April 21, 2026 17:12

felickz requested a review from coadaflorin April 21, 2026 17:29

chore: format data extension files with prettier

336310f

felickz added this pull request to the merge queue Apr 21, 2026

felickz removed this pull request from the merge queue due to a manual request Apr 21, 2026

felickz mentioned this pull request Apr 21, 2026

Update Packs #43

Open

data-douser mentioned this pull request Apr 21, 2026

Add agent skills for creating and publishing CodeQL model packs #44

Open

data-douser requested changes Apr 21, 2026

View reviewed changes

Copilot AI mentioned this pull request Apr 21, 2026

Unpin CodeQL pack dependencies by removing committed lock files #45

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data extension prompts, templates, and barrier/barrierGuard support#42

Add data extension prompts, templates, and barrier/barrierGuard support#42
felickz wants to merge 2 commits intoadvanced-security:mainfrom
forks-felickz:feat/data-extension-prompts

felickz commented Apr 21, 2026 •

edited

Loading

Uh oh!

data-douser left a comment

Uh oh!

data-douser Apr 21, 2026

Uh oh!

data-douser Apr 21, 2026

Uh oh!

data-douser Apr 21, 2026

Uh oh!

data-douser Apr 21, 2026

Uh oh!

data-douser Apr 21, 2026

Uh oh!

data-douser Apr 21, 2026

Uh oh!

data-douser Apr 21, 2026

Uh oh!

data-douser Apr 21, 2026

Uh oh!

data-douser Apr 21, 2026

Uh oh!

data-douser Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		This prompt provides common guidance for developing CodeQL data extensions across all supported languages, while language-specific prompts reference this common guidance and add language-specific details.

		## Product Documentation

		#### Default behavior

		By default, only the `remote` threat model is enabled. This means only sources marked with `kind: "remote"` are active. To include local sources, you must explicitly enable additional threat models via `--threat-model` on the CLI or in the code scanning configuration.


		### Threat Models

		Threat models control which `sourceModel` entries are active during analysis. The `kind` column of a `sourceModel` determines its threat model category.

		Model packs can be used to expand code scanning analysis at scale. Model packs use data extensions, which are implemented as YAML and describe how to add data for new dependencies. When a model pack is specified, the data extensions in that pack will be added to the code scanning analysis automatically.

		Generally each language will allow customization of the following extensible prdicates:


		For general CodeQL data extension model development guidance, see [Common Data Extension Development](./data_extensions_development.prompt.md).
		For general CodeQL query development guidance, see [Common Query Development](./query_development.prompt.md).

Conversation

felickz commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Barrier and Barrier Guard support (CodeQL 2.25.2+)

References

Uh oh!

data-douser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

felickz commented Apr 21, 2026 •

edited

Loading