Add storage finder facets and update CI workflow permissions#281
Add storage finder facets and update CI workflow permissions#281s-sajid-ali wants to merge 43 commits intomainfrom
Conversation
|
|
While the initial implementation of the storage finder was done with data from the CSV sheet, the generated JSON file was edited (for instance in commit d23fe45). Per claude, here's the summary of changes (that need to be made in the CSV file): https://gist.github.com/s-sajid-ali/33ce8a6488db28582d7ccba462e46bff |
3af04ac to
7d91a7b
Compare
Add synchronous access and alumni access facets to the storage finder configuration. Update workflow to include explicit permissions for improved security. Regenerate data files and update dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…et tree Improved clarity of access permission descriptions in config and reorganized the facet tree with numeric IDs, added contextual descriptions for risk classification and affiliation questions, and reordered questions for better user flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Restore descriptions from main branch for "What is the risk classification of your data?" and "What is your University affiliation?" facets to improve user guidance in the storage finder UI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7d91a7b to
14978d6
Compare
|
Deleted the |
|
Per analysis by @Amanda-dong: 7fdcefd removed |
Adds a new "From where will the data be accessed?" question with four choices (VPN, Public Cloud, Off Campus, Browser GUI), driven by the new "Access locations" CSV column. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The access-location facet was missing a corresponding field definition, so access location data was not included in service records' field_data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace loose word-boundary patterns with patterns that require each keyword to appear as a standalone comma-separated item, preventing "VPN" from matching within embedded text and ensuring any combination of access locations is handled correctly without hardcoding. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Google Sheet export includes a line break inside the column header "Access locations (VPN, Public Cloud, \nOff Campus, Browser GUI)", causing row lookups to return undefined for every service and triggering the fallback: "all" for all access-location facets regardless of actual data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Removed the option |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace outdated Drupal-based documentation with accurate description of the Google Sheets CSV generator workflow, automated GitHub Actions weekly sync, current questions, service fields, and matching logic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Pinging @remram44 @VickyRampin for reviewing the data services your team offer. In particular the Ceph service has no information about access policies. This is the source sheet. Please let me know if you do not have access to edit it and I'll grant it. |
|
The storage finder can't be found from the search bar, only the bottom of the front page: https://services.rt.nyu.edu/storage-finder/ What does "synchronous access" mean? I see it's "yes" for HPC scratch and qualtrics but "no" for research workspace, what does it mean? I suggest removing that line entirely if we don't know what it is, as users won't know either. "Alumni access" is also yes for scratch so maybe it should be yes for Ceph? Obviously you need to have a sponsor either way, alumni don't automatically access to HPC scratch. At this point Ceph is accessible in 3 different ways:
Maybe those should be separate rows, depending on what "synchronous access" means and the amount of detail we want in the "permission settings" column? |
Correct, it is not indexed by the client side search engine we currently use and I don't think there's a way to add non-local URLs for that search index to crawl. We could do that if we switch to Algolia (in #298) by adding the source URL for the Google Sheet. Or we convert that sheet to markdown and add a new page at
I'm okay with that. @genericdata : What did we mean for this column to indicate originally?
Allowed that facet for Ceph for consistency and yes, we'll have to point to the access policy somewhere. |
…alues Update anchor-based regex matchers to handle leading/trailing whitespace and newlines in spreadsheet cell values. Adds multiline flag (m) so ^/$ match line boundaries, and adds \s* around anchors to absorb surrounding whitespace. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e suffixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…capacity matchers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
I mean that you can't find this page with the search, not the Google sheet. "data finder" or "storage" should probably point to the storage finder. What does "alumni access" mean? That it is possible for an alumni to have access if they get a researcher sponsor? In this case the answer is "yes" for a lot more options (S3, HPC RPS, Data Lake, probably Google Shared Drive and Research Workspace too) |
I'll take a look at indexing that page.
I agree and have removed it now. I was mainly focused on moving to ingesting the data from the Google sheet that I didn't really think about which data made sense to move. |
|
I think something happened with the risk rating, the table now only shows "Storable Files: High" which is not as clear as the previous "Storable Files: High, Moderate, & Low Risk". The word "risk" should be present. |
|
A lot of other facets lost details, such as "backup", which changed for Box from "Retains up to 100 previous versions of a single file" to "yes" (lost details) and for S3 from "available for additional cost" to "yes" (incorrect) for example. |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cation to include all lower tiers
This PR updates the implementation to populate the datafinder data with to account for all facets, updates the CI workflow permissions and on a related note the source Google Sheet URL has also been updated.