Search NCBI for papers associated with private datasets#606
Search NCBI for papers associated with private datasets#606vagisha merged 28 commits intorelease25.11-SNAPSHOTfrom
Conversation
…he PubMed Id, PubMed search strategy, and user dismissal status - Updated panoramapublic.xml to be consistent with changes to DatasetStatus - Changed schema versionf rom 25.003 to 25.004
- Added checkbox to enable publication check in the private data reminder settings form - Added "Enable Publication Check" checkbox on sendPrivateDataRemindersForm.jsp that can override the value in the saved settings. - Added DismissPubMedSuggestionAction - Added CheckPubMedForDatasetAction to perform publication check on one dataset - New method added to PanoramaPublicNotification for sending message about published paper match - Updated PrivateDataReminderJob to check for publications
…matches will have PubMed Ids. - Post notification to support thread when publication suggestion is dismissed.
- Removed redundant constants - Removed redundant class PublicationCheckResult
…create date in the sorting logic
- Parameterized log calls.
…icationsForDatasetApiAction
…g code in the action class - Added toJson method to PublicationMatch.
- Display the citation in the notification message as well as the result page when querying publications for a single dataset.
- Mock NCBI service for TeamCity tests - Fixed notification messages
- Added API action class to register mock publication data with the MockNcbiPublicationSearchService - Updated PublicationSearchTest - test one more dataset
- Fixed citation retrieval through mock service - Fixed confirm messages
- Added "tool" as parameter to EUtils requests - Require at least two keywords in title for keyword match - Updated unit tests
…rred re-search after dismissal - Re-search NCBI after deferral expires; clear dismissal if different publication found - Add publication search frequency (default 3 months) to admin UI and settings form - Update tests to match new log messages and Date-based dismissal
…exception fetching the citation from NCBI, display the publication Id label instead.
… prevent stale reads - Fixed comment for PUBMED_ID regex - In searchPublicationsForDataset.jsp display the publication Id label when citation is not available.
… have to lookup the citation for a saved publicationId. - Added test: running the pipeline job in "test" mode should not update rows in the DatasetStatus table - Updated PublicationMatch constructor and fromMatchInfo method - Fixed comments, renamed variables, class.
…publication" message. - Improve author match and title match logic
| private List<String> executeSearch(String query, String database, Logger log) | ||
| { | ||
| String encodedQuery = URLEncoder.encode(query, StandardCharsets.UTF_8); | ||
| String url = ESEARCH_URL + |
There was a problem hiding this comment.
These parameters should be URL-encoded. encodedQuery is already handled but unless database and others are known to the safe, they should be encoded.
There was a problem hiding this comment.
They are string constants and are safe, but URL-encoding them is correct practice. Extracted a buildCommonParams() helper that encodes all three common parameters (db, tool, email)
| if (ids.isEmpty()) return Collections.emptyMap(); | ||
|
|
||
| String idString = String.join(",", ids); | ||
| String url = ESUMMARY_URL + |
| } | ||
|
|
||
| // Search PubMed: "LastName FirstName[Author] AND Title NOT preprint[Publication Type]" | ||
| String query = String.format("%s %s[Author] AND %s NOT preprint[Publication Type]", |
There was a problem hiding this comment.
I don't know the PubMed search syntax very well, but any need to do escaping here?
There was a problem hiding this comment.
There is no escape mechanism for special characters as far as I can tell. Per https://pubmed.ncbi.nlm.nih.gov/help/ the recommendation is to remove special characters. Added stripQuerySpecialChars() to strip []()\".
| if (title == null) return ""; | ||
|
|
||
| return stripDiacritics(title.toLowerCase()) | ||
| .replaceAll("<[^>]+>", " ") // Strip HTML/XML tags |
There was a problem hiding this comment.
If we need to strip tags, do we also need to HTML-decode?
There was a problem hiding this comment.
Good point. So far I have only seen formatting tags (<i>, <sub> etc.). But I have added StringEscapeUtils.unescapeHtml4() just in case.
| } | ||
| catch (InterruptedException e) | ||
| { | ||
| Thread.currentThread().interrupt(); |
There was a problem hiding this comment.
I think this state will already be set by virtue of the exception, but shouldn't be harmful either.
| */ | ||
| private static String quote(String str) | ||
| { | ||
| return "\"" + str + "\""; |
There was a problem hiding this comment.
Should this escape quotes within the string?
There was a problem hiding this comment.
This is only called for PX ID, Panorama short URL and DOIs which will not contain quotes. But I've updated the method to strip quotes defensively.
| @Override | ||
| public boolean handlePost(NotifySubmitterForm form, BindException errors) throws Exception | ||
| { | ||
| ExperimentAnnotations exptAnnotations = ExperimentAnnotationsManager.get(form.getId()); |
There was a problem hiding this comment.
Should this also check that the container matches?
There was a problem hiding this comment.
Ah, yes. Added a call to ensureCorrectContainer().
| return new SimpleErrorView(errors); | ||
| } | ||
|
|
||
| form.setPubmedId(_copiedExperiment.getPubmedId()); |
There was a problem hiding this comment.
Was this intentional? If so, great. If not, we're not propagating the ID anymore.
There was a problem hiding this comment.
No! However, the form can arrive pre-populated via the URL when the user clicks the "Make Public" link in a notification. Fixed the code to fall back to the copied experiment's PubMed ID when the form doesn't already have one.
| PanoramaPublicNotification.postPrivateDataReminderMessage( | ||
| journal, submission, exptAnnotations, submitter, getUser(), notifyUsers, | ||
| _announcement, _announcementsContainer, getUser(), selectedMatch); |
There was a problem hiding this comment.
Should this and subsequent updates in this method be transacted?
There was a problem hiding this comment.
Yes, they should be. Fixed.
- URL-encode db and tool parameters in NCBI API calls - Strip PubMed query syntax characters ([]()\") from author/title query. No escape mechanism exists - Decode HTML entities and tighten tag-stripping regex in normalizeTitle() - Strip inner quotes from quote() defensively - Add container check (ensureCorrectContainer) in NotifySubmitterOfPublicationAction - Propagate PubMed ID in UpdatePublicationDetailsAction only when form doesn't already have one (e.g. URL-supplied value from notification link) - Wrap notification post and DatasetStatus update in a transaction in NotifySubmitterOfPublicationAction - Update tests
Rationale
Enhance the Private Data Reminder system to automatically detect publications associated with private datasets by searching PubMed Central and PubMed. When a publication is found, the reminder message includes the citation and encourages the submitter to make their data public.
Related Pull Requests
Changes
NcbiPublicationSearchServicethat searches for publications inPrivateDataReminderJobwhen enabled in the admin consolePrivateDataReminderJob, instead of the usual reminder message, a "publication found" message is posted to the submitterDatasetStatuscaches publication result to avoid repeated calls to NCBI's EUtils endpointsNcbiPublicationSearchServiceImplNcbiPublicationSearchServiceImpl.PublicationSearchTest.private-data-reminders-overview.md
SPEC-SUMMARY.md