Skip to content

Data-driven Clusters v4.1 page (11 clusters from Google Sheet)#720

Open
LukasWallrich wants to merge 8 commits intomasterfrom
data-driven-clusters-v4
Open

Data-driven Clusters v4.1 page (11 clusters from Google Sheet)#720
LukasWallrich wants to merge 8 commits intomasterfrom
data-driven-clusters-v4

Conversation

@LukasWallrich
Copy link
Contributor

@LukasWallrich LukasWallrich commented Mar 22, 2026

Summary

Replaces the 7 hardcoded cluster pages (v3) with a fully data-driven approach powered by the FORRT Clusters v4.1 Google Doc and a structured Google Sheet.

What changed

  • 11 clusters (was 7), 93 sub-clusters, ~1300 publications with DOI-resolved APA references
  • New parsing script (scripts/parse_clusters_to_sheet.py) that:
    • Fetches the Google Doc as plain text and parses the hierarchical structure
    • Resolves ~1050 DOIs via doi.org content negotiation for clean APA references + BibTeX
    • Writes structured data to a Google Sheet (3 tabs: Clusters, Sub-Clusters, Publications with data validation)
    • Exports data/clusters_v4.json for Hugo to consume at build time
  • New Hugo shortcode (layouts/shortcodes/clusters_display.html) that renders all clusters from the JSON data with:
    • Sidebar navigation with collapsible cluster tree and colored arrows
    • Tabbed sub-clusters (matching the previous UI pattern) with wrapping support
    • Sub-cluster headings, italic descriptions, and bulleted reference lists
    • Full-text search across clusters, sub-clusters, and all references (with match highlighting and click-to-scroll)
    • DOI links rendered as clickable URLs; HTML formatting (e.g. <i> for italics) preserved from doi.org
    • Responsive layout (sidebar collapses on mobile with toggle button)
  • Updated intro text to reflect 11 clusters (was 9)
  • Deactivated old cluster1.mdcluster7.md (set active = false)

Data pipeline

Google Doc (v4.1)
    ↓  parse_clusters_to_sheet.py
Google Sheet (3 tabs with data validation)
    ↓  --export-json flag
data/clusters_v4.json (committed to repo)
    ↓  Hugo build
clusters_display.html shortcode renders the page

The script supports --dry-run, --skip-doi, --json-only, and --export-json flags. DOI lookups are cached in scripts/doi_cache.json (gitignored) for fast reruns.

Screenshots

The page preserves the established tab-based UI for sub-clusters while adding sidebar navigation and full-text search. Each cluster section has an alternating pastel background color.

Test plan

  • Run python3 scripts/parse_clusters_to_sheet.py --dry-run to verify parsing (expect 11 clusters, ~93 sub-clusters, ~1297 publications)
  • Run hugo server and verify /clusters/ renders correctly
  • Test tab switching within clusters
  • Test sidebar navigation (expand clusters, click sub-clusters)
  • Test full-text search (e.g. search for an author name, click result to scroll)
  • Test on mobile viewport (sidebar toggle, content layout)
  • Verify print view shows all tab content

🤖 Generated with Claude Code

Replace the 7 hardcoded cluster markdown files with a data-driven approach
that reads from a generated JSON file (clusters_v4.json). The data originates
from the FORRT Clusters v4.1 Google Doc and is parsed into a Google Sheet,
then exported as JSON for Hugo to consume at build time.

Key changes:
- New script (parse_clusters_to_sheet.py) that parses the GDoc, resolves
  DOIs via doi.org for clean APA references + BibTeX, writes to Google Sheet,
  and exports JSON for Hugo
- New Hugo shortcode (clusters_display.html) renders all clusters with
  sidebar navigation, tabbed sub-clusters, and full-text search
- Updated intro text to reflect 11 clusters (was 9)
- Deactivated old cluster1-7.md files (replaced by data-driven rendering)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@LukasWallrich LukasWallrich requested a review from a team as a code owner March 22, 2026 22:24
@github-actions
Copy link
Contributor

👍 All image files/references (if any) are in webp format, in line with our policy.

@LukasWallrich
Copy link
Contributor Author

LukasWallrich commented Mar 22, 2026

Staging Deployment Status

This PR has been successfully deployed to staging as part of an aggregated deployment.

Deployed at: 2026-03-23 23:50:03 UTC
Staging URL: https://staging.forrt.org

The staging site shows the combined state of all compatible open PRs.

@forrtproject forrtproject deleted a comment from github-actions bot Mar 22, 2026
The clusters page now has its own full-text search that covers
clusters, sub-clusters, and all references. The site-wide Academic
search is redundant and has been disabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 22, 2026

📝 Spell Check Results

Found 6 potential spelling issue(s) when checking 30 changed file(s):

📄 static/js/clusters-page.js

Line Issue
80 tabEl ==> table
81 tabEl ==> table
83 tabEl ==> table
85 tabEl ==> table
474 tabEl ==> table
475 tabEl ==> table

ℹ️ How to address these issues:

  1. Fix the typo: If it's a genuine typo, please correct it.
  2. Add to whitelist: If it's a valid word (e.g., a name, technical term), add it to .codespell-ignore.txt
  3. False positive: If this is a false positive, please report it in the PR comments.

🤖 This check was performed by codespell

@richarddushime
Copy link
Contributor

we now have 2 searches box funcs
I m proposing that we remove the custom search on the left and leave the search on top of clusters

meanwhile i will continue enhancing it , would be good if you can check it asap
@LukasWallrich @flavioazevedo

@LukasWallrich
Copy link
Contributor Author

LukasWallrich commented Mar 23, 2026

Thanks @richarddushime! I agree that we need to get rid of one of the searches.

There is also now too much going on in this area - too many boxes. Maybe the syllabus does not need to be in a box?
image

Can we also remove the outdated figure and really condense the text? I think the following is all we need above the clusters - unless @flavioazevedo disagrees (but Richard, please make the change so that he can look at a complete new draft)

Teaching Open and Reproducible Science shouldn't require educators to spend months sifting through a decade of literature. FORRT simplifies this process by providing a curated, expert-backed framework. Developed by over 50 scholars, our taxonomy organizes open scholarship into 11 distinct clusters, offering a clear pathway for integrating these tenets into your teaching and mentoring, regardless of your field or level of expertise.

@richarddushime
Copy link
Contributor

I am from making other adjustements
removed the left search and enhanced the functionality of the search (I limited the search not to go through references because it was getting a lot of results from references and making a user loose necessary text of the clusters)

I would like also clarification about the below

Teaching Open and Reproducible Science shouldn't require educators to spend months sifting through a decade of literature. FORRT simplifies this process by providing a curated, expert-backed framework. Developed by over 50 scholars, our taxonomy organizes open scholarship into 11 distinct clusters, offering a clear pathway for integrating these tenets into your teaching and mentoring, regardless of your field or level of expertise.

Do you mean all the contents before the forrt syllabus and the figure all removed and replaced by this paragraph ?

About the figure i think its good to keep having it as we wait for the updated one (may be flavio can push for its design quickly ?)

@richarddushime
Copy link
Contributor

Additionally here is something i am proposing

in the latest commit I Introduces dedicated, indexable URLs for each FORRT cluster (/clusters/cluster-N/) alongside the existing taxonomy hub (/clusters/), so each cluster is a first-class page for search and sharing.

The reason i Added this is that Clusters in sitemap are only covered by 1 url (the main cluster page) or we can have each cluster indexable

by :
Canonical URLs per topic — One clear URL per cluster (and its sub-clusters in-page), instead of relying on a single long hub page or hash-only navigation for discovery.
Unique metadata per URL — Each cluster page can carry its own <title>, meta description, and Open Graph / Twitter fields from front matter, improving relevance for queries and snippet quality.
Structured data — Per-page JSON-LD (cluster_seo_jsonld) ties each URL to explicit taxonomy/entity signals for that cluster.
Topic-cluster information architecture — The hub remains the overview and entry point; cluster pages act as satellites with internal links between hub and subpages, supporting crawl paths and topical grouping.
Stable deep links — Shareable URLs (including hash targets for sub-clusters where used) support accurate social previews, backlinks, and citations to the right slice of the taxonomy.

you can check the preview by https://staging.forrt.org/clusters/cluster- [cluster-number-eg:2 or 2] eg: https://staging.forrt.org/clusters/cluster-2/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants