davidrothman.net

davidrothman.net

Exploring Medical Librarianship and Web Geekery

 
 
 
 

Archive for 3rd Party PubMed/MEDLINE Tools

UNYOC (CE slides) and NYLA Tomorrow

My apologies to the awfully nice folks who attended the CE course I taught at UNYOC a couple of weeks ago! I’ve taken far too long to get these slides posted:

Also: I’ll be on a panel at NYLA tomorrow (Friday, 11/6/2008) afternoon at 4:00 PM- please say hello if you’re going to be there! As usual at these sorts of things, I’ll know almost nobody. But hey- I might get to meet Polly Farrington!

Proof that this blog has the Best Readers Ever

Last week I posted Rachel Walden’s readlly good idea for a useful 3rd-party PubMed/MEDLINE tool and received several exciting responses.

Martin Gerken

Martin Gerken was the first to make an attempt that you can try at:
http://www.pharmakologie-bremen.de/test/meshr.html

…but Rachel got some error messages from it.

GoPubMed

Martin and Rebecca both suggested using GoPubMed.

David (not David Rothman) confirmed that GoPubMed worked nicely but had some problems (which GoPubMed’s Dr. Liliana Barrio-Alvers later answered).

David’s (not Rothman) Tool

David (again, not David Rothman) also made an attempt at creating the tool that Rachel asked for that you can try here:
http://www.docmobi.com/mesh/.

I threw in a list of PMIDs and got useful results presented in a pleasant manner:

Nice, huh?

Rajarshi’s Tool


Rajarshi Guri was next
to build a tool to do this. His, though, doesn’t have an interface- you just add your PMIDs to the URL. Here’s an example using the same PMIDs I used to test David’s tool.

Rajarshi also built a Ubiquity command (more on Ubiquity here) that functions reasonably well as an interface- though still not as well as a simple Web form- and without a simple Web form, the tool isn’t really available to a lot of potential users.

Pierre’s Tool

Pierre Lindenbaum spent 30 minutes building a tool to match Rachel’s specs. I was unable to get it to work, but you can download it here and give it a try.

You people rule.

A (really good) Idea for a 3rd Party PubMed/MEDLINE Tool

Rachel Walden writes:

What I’d like to do is to be able to enter the PMIDs of several citations and have the tool search MEDLINE via PubMed for the assigned MeSH terms, and return a single list of the terms used by any of the entered citations with a measurement of frequency. For example, if I input PMIDs 16234728, 15674923, and 17443536, the tool would return results telling me that 100% or 3 of 3 use the term “Catheters, Indwelling”, 2 of 3 use “Time Factors,” 1 of the 3 uses “Urination Disorders,” and so on. Although this example uses 3 PMIDs, I’d like to be able to input at least 10, just based on personal experience.

This would be useful in situations where a single “gold standard” search strategy is needed for the purposes of a systematic review or other process - for example, we may find a number of great articles on a topic by using multiple approaches to the search, but have difficulty developing a single strategy that captures them all due to differences in indexing. In effect, it would inform reverse-engineering a search strategy from a pool of relevant citations. It might also be helpful as a teaching tool for medical librarianship students and those new to the profession.

No, it wouldn’t change my medical librarian life, but it would make it easier from time to time!

This is a really great idea and I don’t think it’d be too difficult to implement for a Web applications developer who knows how to work with NCBI’s API tools. Any takers? - David

MEDLINE Cognition (SemanticMEDLINE.com)

[EDIT: Sandy Swanson notes that "It appears that foreign-language articles are not included in Semantic Medline." I guess that isn't surprising. After all, its NLP has to be language-specific and there are more articles in English than any other language.]

I knew that an interface for MEDLINE using Natural Language Processing was being developed at Lister Hill, and PubFocus has been called “semantic MEDLINE” too, but I heard a couple of days ago about a tool from Cognition Technologies called (appropriately enough) SemanticMEDLINE.

I’m going to need to play with it a bit more before having any idea if it’ll be useful to me, but it is interesting.

I searched for “Are probiotics an effective therapy for Crohn’s disease or ulcerative colitis?” and got the following results:

The most interesting part of this is how the panel on the right has drop-down menus for the terms it recognizes, allowing the user to make sure the search is using the correct terms/definitions.

What I don’t understand yet is how these definitions are utilized in performing the PubMed/MEDLINE search.

Be sure to check out the HELP page for notes on the way it uses AND, OR, WITH, and WITHIN operators, the way it uses quotation marks, and how to work with capitalization.

[via]

[Other posts about 3rd Party PubMed Tools]

PubMedPDF

pubmedpdflogo.pngI’ve previously posted about commercial applications for managing PDF files that access PubMed for article metadata (including iPapers, Papers, Sente, BibDesk, and Librarian) but I just stumbled across a new (to me) open source option called PubMedPDF.

Built on the open-source content management system XOOPS (XOOPS Cube fork), PubMedPDF “…is a Document Management System which provides various useful functions. This uses ID which is used in the PubMed Database to automatically generate paper information. If the paper you want to register has that ID, you don’t have to input any information.”

Also of interest to Mac users is the BioMed Lab Portal Server Package.

e-LiSe

Been meaning to post about e-LiSe since I saw the article about it in March.

“e-LiSe (e-Literature Searcher) is an easy-to-use web-based application which finds biomedical information truly related to English words provided by the user. The program uses PubMed database of scientific abstracts as the source of data and a novel bio-linguistic statistical method (based on Z-score), to discover true correlations, even when they are low-frequency associations.

e-LiSe is also capable of finding names of researchers correlated to the information searched by the user. It can function as a name reference engine, answering questions like “who is working on specified subject?” or “what are the coworkers/collaborators of a certain person?”. For the latter the software uses the list of co-authors of each publication a researcher has written to display connections between scientists.”

PubMed Faceoff

PostGenomic’s PubMed Faceoff is the first 3rd Party PubMed/MEDLINE Tool I’ve looked at that really made me chuckle.

This site applies a simple, photorealistic variant of the Chernoff Faces visualization technique to impact factor data for papers in the PubMed database of biomedical literature.

Basically it allows you to search PubMed and have the results represented as a set of human faces.

Each paper is represented as a face. The ethnicity and gender of the face is selected at random for visual interest - you can turn this feature off if you so choose.

The age of a face correlates with the publication date of the paper. Younger faces are more recent papers.

A smile means that the paper has been cited more times than expected (based on its age). Larger smiles mean more citations.

A frown means that the paper has been cited far less than you might expect.

The raised eyebrows correlate with the impact factor (sort of - actually the Eigenfactor) of the journal in which the paper was published.

Some example search results:

I absolutely appreciate the concept (potentially being able to estimate several properties of an article at a glance)- it’s just that some of the facial expressions crack me up.

CoPub

http://services.nbic.nl/cgi-bin/copub/CoPub.pl

CoPub is a text mining tool that detects co-occuring biomedical concepts in abstracts from the Medline literature database. The biomedical concepts included in CoPub are all human, mouse and rat genes, furthermore biological processes, molecular functions and cellular components from Gene Ontology, and also liver pathologies, diseases, drugs and pathways. Altogether more than 250,000 search strings are linked with CoPub.

Special attention was given to genes and proteins. For all human, mouse and rat genes not only long forms of names were used, but also their symbols and aliases, which increases recall. Symbols not referring to genes or proteins are a well known problem, but sophisticated scripts detect these homonyms and neglect the abstracts in which they occur thereby increasing precision.

Features include:

* Fast and easy access to relevant abstracts
* Single gene search in all categories
* Multiple gene search in all categories
* Single keyword search in gene category
* Categories of biomedical concepts: genes (human, mouse, rat), liver pathologies, biological processes, molecular functions, cellular components, diseases, drugs, pathways
* Use of long forms, symbols and aliases of genes
* Homonym detection
* Statistical filter to display only significant biomedical concepts
* Based on Medline abstracts till February 2008

MiSearch: Adaptive PubMed Search Tool

http://misearch.ncibi.org/

From MiSearch Help:

“MiSearch works with NCBI Entrez and your history of browsing to build a profile of your areas of interest, and uses this information to rank citations likely to be of most most information to you at the top of the list.”

“MiSearch uses a classification algorithm based on MeSH term, substance names and author names associated with citations. Two sets are defined. One is the set of articles you have previously clicked on to view. The other is all of PubMed. For each citation in the retrieval set, the algorithm calculates the likelihood that the citation is a member of these two sets. Article having the highest likelihood of belonging to the set of articles you have viewed are ranked at the top of the list.

The “User” field is used as an identifier to track usage. If you do not provide a name, the IP address of your request will be used as a default. If you know you will be doing searches for different tasks with different subject areas, feel free to define a “User” for each task.”

Slides from an MLA 2008 presentation by NLM Associate Fellow Marisa Conte

MLA 2008: Plenary Session IV Slides

David Rothman

Amanda Etches-Johnson

Melissa Rethlefsen

Bart Ragon

PubGet (3rd Party PubMed/MEDLINE Tool)

The idea behind Pubget is that it speeds up the process of grabbing the full-text PDFs from PubMed search results. The videos below illustrate the idea:


Above: Embedded video. If you are reading this in an aggregator, you may need to visit the site to view the video.

If you’re at one of the following institutions, you can try a full-featured Pubget that links to full-text PDFs available to these institutions:

From Pubget’s public site, you can get a feel for how it works, but it’ll only pull up open access PDFs.

To keep up on new developments, you can subscribe to the feed of the Pubget blog.

Interested in getting this service for your library’s users? Get in touch and let them know you’re interested.

JANE, eTBLAST, and Whatizit

When I posted in February about JANE, I should also have mentioned eTBLAST(previously mentioned here):

Our service is very different from PubMed. While PubMed searches for “keywords”, our search engine lets you input an entire paragraph and returns MEDLINE abstracts that are similar to it. This is something like PubMed’s “Related Articles” feature, only better because it runs on your unique set of interests. For example, input the abstract of an unpublished paper or a grant proposal into our engine, and with the touch of a button you’ll be able to find every abstract in MEDLINE dealing with your topic. No more guessing whether your set of keywords has found all the right papers. No more sorting through hundreds of papers you don’t care about to find the handful you were looking for–our search engine does it for you.

I also recently stumbled across Whatizit:

Whatizit is a text processing system that allows you to do textmining tasks on text. The tasks come defined by the pipelines in the drop down list of the above window and the text can be pasted in the text area. The description of each individual task/pipeline can be found following the link next to the submit button. Whatizit is also a Medline abstracts retrieval/search engine. Instead of providing the text by Copy&Paste, you can launch a Medline search. The abstracts that match your search critetia are retrieved and processed by a pipeline of your choice.

When the user actually *is* broken (Anna Kushnir and PubMed)

I have distinct childhood memories of asking my mother what one word or another meant. She would point out that there was a dictionary close at hand designed exactly for that purpose and invite me to make use of it.

I remember asking my father to teach me to program in BASIC. He cheerfully agreed and handed me the big brown manual.

So maybe I’m weird and so are my folks, but these memories inform my take on the chatter in the blogosphere and on MEDLIB-L about this post by Harvard PhD student Anna Kushnir in which she expresses her frustration with PubMed. Kushnir writes (in part):

“I hate PubMed. I hate it with a burning passion. For a site that is as vital to scientific progress as PubMed is, their search engine is shamefully bad. It’s embarrassingly, frustratingly, painfully bad.”

[...]

“Why is PubMed so behind the times? Why? How does it even work? Does it search only the abstract? Does it also search the body of the papers that are available online? Why does it get so massively confused by an author’s initials and last name together, in one search? Why can’t it alert me when papers relevant to my work are published?”

I’m the first to admit that PubMed has problems and much room for enhancement, but if Kushnir had bothered to look at PubMed’s help manual or try some of its excellent tutorials she’d have learned exactly how it works, what PubMed indexes, how she can search by author, and that it can alert the user when papers relevant to her work are published via email or RSS.

So while PubMed has real, legitimate problems, Kushnir’s complaints don’t really touch on any of them. She could’ve resolved the problems she notes by flipping through the well-written, clearly laid-out, easy-to-navigate manual.

A number of helpful people who are much nicer than I am left useful comments for Kushnir.

Medical librarian Kathleen Crea offered a clear explanation of how articles are indexed and what MeSH is.

Medical librarian Rachel Walden even offered to help remotely with specific searches if Kushnir didn’t have a Harvard medical librarian handy.

But Kushnir decided that none of this really helped and later commented:

I don’t think I should have to be, or enlist the services of, a medical librarian in order to do a simple search on a literature search engine. PubMed should be an intuitive search engine such as Google, or others. I don’t know of many researchers, either MDs or PhDs, who have had extensive training in computer science or search algorithms. I am going to go out on a limb and say that I am representative of many other biomedical researchers in my struggles with PubMed. I am trained in Cell Biology and Virology. PubMed should be tuned to my needs and my skill set. I should not have to tune to it. Harsh as it may sound, PubMed is most useful for biomedical professionals, not for medical librarians or for computer scientists. Yes, if I devoted an afternoon or more to learning the system I dare say I would become a proficient, but my question stands – why should I have to?

Huh.

The index of biomedical literature searched from PubMed is a vast and complex set of data. Any tool that will search it effectively for very specific needs will necessarily be complex. If Ms. Kushnir doubts this, perhaps she should perhaps try any other interface for the same data. Some other interfaces work better for some purposes and some users, but all are complex.

Using PubMed does not require “extensive training in computer science or search algorithms,” it requires reading the manual. Kushnir actually admits that if she “devoted an afternoon or more to learning the system” she would “become a proficient,” and yet she fails to recognize her complaints as the whining they are.

Kushnir writes at JOVE:

My rant somehow wound up on a medical librarian listserv and they came out in force defending NCBI and PubMed, listing pages and pages of helpful and warm instructions and hints on how to make it do what I need it to do, pages of suggestions, with offers of hands-on assistance and training, which have all been wonderful. Occasionally though, they were biting and harsh, saying that if I only knew what I was doing (and only if I weren’t so ignorant… yup, ignorant), PubMed would seem to me the greatest thing ever.

I’m not criticizing Kushnir’s ignorance and would take issue with those who did. Ignorance, once identified, should alert the librarian to a teaching opportunity- not an occasion for shaming. Criticizing the extraordinary laziness in her refusal to receive help from a librarian or to take a quick look at the manual, though? That’s fair game.

Kushnir continues:

I am a research scientist by long, hard training. I am a fairly web-savvy research scientist, and still, I have trouble with PubMed.

As a medical librarian friend recently pointed out to me, it requires instruction to learn to drive a car. Kushnir is unwilling to read the manual and wants to blame PubMed/NLM for her difficulties. Kushnir talks about having spent hours trying to get PubMed to do what she wants, but declines help from multiple medical librarians who’re happy to teach her and can’t be bothered to invest 30 minutes in reading from the manual because it should, in her thinking, be possible to do without any effort on her part.

Kushnir continues:

The search engine is not made for medical librarians. It’s not made for computer programmers. It’s made for scientists, to be used by scientists, needed most by scientists.

Actually, Medline’s history is that it was made primarily for medical librarians and secondarily for physicians, but that’s not really important.

It should be easy for scientists, goofy, only moderately-computer literate scientists, to use. It should be intuitive (read: Google), it should not have a ginormous page of inscrutable instructions, it should not require the hour-long training sessions, kindly offered at most medical libraries. It should be plug and chug.

I might just as well argue that the tools of virology research should be intuitive to me. After all, I’m a very computer-literate, Web-savvy biomedical information professional. Why should I need her years of training to understand her work?1

“Inscrutable?”
Kushnir also describes PubMed’s help documentation as “inscrutable.” When I was teaching myself how to use PubMed, I found the documentation clear and helpful, so this surprised me. I decided to run the PubMed Quick Start document through Google Docs’ analysis:

Let’s review these scores:

Flesch Reading Ease: 62.97
(A score from 60-69 is considered “standard”)

Flesch-Kincaid Grade Level: 5.00
(Fifth grade)

Automated Readability Index: 5.00
(Again, fifth grade)

So it would appear that the help documentation is written at a fifth-grade level. I find it hard to believe that a PhD student at Harvard cannot read at a fifth-grade level, so I’m left with the impression that Ms. Kushnir didn’t actually attempt to read any of the documentation before declaring it “inscrutable.”

Suggestions for Ms. Kushnir and other research scientists who don’t like reading the instructions:

So the tool is necessarily complex because the data it searches is complex and the user refuses to read the well-written help documentation or accept help from a friendly librarian (even when multiple librarians are reaching out across thousands of physical miles of distance and the gulf of the patron’s unwillingness to learn).

I can only conclude that sometimes the user *is* broken.2

Thank you to the two medical librarian friends who read the first draft of this post and offered comments.


1 Hint: Because the work is complex and involves a skill set that grows (with effort) over time.

2 See Karen Schneider’s excellent post, “The User is Not Broken”.

More PubMed for Facebook

Gerry McKiernan points out two Facebook applications for searching PubMed, PubFace and PubMed Search.

PubFace Results:

PubMed Search Results:

It’s sort of neat to be able to quickly share a PubMed citation with another Facebook user (see the link in the PubFace results above for “Send to a friend” or PubMed Search’s “Share this” button) and it is handy to be able to add citations to a collection (see PubFace’s “Add to MyLibrary” links or PubMed Search’s “add this to your favorites”)… but I’m having trouble seeing how it is preferable to using PubMed itself and making use of MyNCBI or “Send to email”…or using a powerful bookmarking tool like del.icio.us, Connotea or CiteULike.

I’m only a casual Facebook user, so it is entirely possible I’m missing something. If so, please clue me in? Thanks!

Making PubMed “Easy”?

Jon Brassey writes:

I may have missed something, but none of these alternate interfaces allow easy searching of PubMed. Some are wonders of programming, some allow some very neat tricks but none make searching of PubMed easy.

That’s a fair criticism, I suppose. I think that although PubMed has come a very long way in developing tools that make searching the NLM’s databases easier for medical librarians, clinicians and consumers, it still takes some knowledge and skill to perform a really useful search of the primary literature.

Jon continues:

I suppose my biggest issue with PubMed is that doing a search of statins returns 18,491 results. Unpicking that a bit:

* Most research shows search engine users finish looking after 3 pages of results.
* From our own experience with TRIP we also know that most users only use single search terms (e.g. asthma, hypertension).

So what I’m saying is that statins is a realistic search term and that suggests that 18,431 (18491-60) results are superfluous.

Therefore, the two challenges to me are:

* Return fewer results in the first place
* Allow users to easily qualify their searches.

Let’s bring Jon’s challenges to Healia’s PubMed/MEDLINE Search. It isn’t my favorite, but if Jon uses it to search for statins, he’ll see that, at the time of this writing, only 7,731 results are returned1 and that there are a number of tools for “qualifying” the search right there on the search results page.

You can adjust for date:

You can filter for review articles or for English language articles only:

You can filter by patient demographics:

Healia even recognizes that we’re searching about a class of drugs and gives us tabs so we can filter by Dosage, Usage, or Side Effects:

So let us assume that Jon’s hypothetical user wants to see English-language review articles about the dosage of statins from only the last 5 years. That’s only twelve results. I think this passes Jon’s test.

With that out of the way, I need to add:

Criticizing PubMed for returning too many search results for a query as inadequate as “statins” seems unreasonable to me. A major part of what makes PubMed an amazing tool is its complexity. I believe and hope that new tools will continue to be developed that make the data useful to various kinds of users in various new ways, but I don’t expect that getting exactly what one wants from the primary literature will ever truly be “easy.”


1 I’m guessing that the reason why PubMed returns 18,431 results and Healia’s PubMed/MEDLINE search returns only 7,731 is that Healia’s search is only looking for the string statins.

PubMed, on the other hand, translates statins into “hydroxymethylglutaryl-coa reductase inhibitors”[MeSH Terms] OR “hydroxymethylglutaryl-coa reductase inhibitors”[Pharmacological Action] OR statins[Text Word].

After all, if we search PubMed for “statins” as a string, we get only 7,990 results.

MScanner: a classifier for retrieving Medline citations

MScanner: a classifier for retrieving Medline citations
Graham L Poulter, Daniel L Rubin, Russ B Altman and Cathal Seoighe
BMC Bioinformatics 2008, 9:108doi:10.1186/1471-2105-9-108
Published: 19 February 2008

Free full text: [PDF]

Article is about a third-party PubMed/MEDLINE tool that I have not been able to make work, MScanner.

[Other posts about third-party PubMed/MEDLINE tools]

Article about Anne O’Tate (3rd Party PubMed/MEDLINE Tool)

Free full text article (PDF) from the Journal of Biomedical Discovery and Collaboration on Anne O’Tate.

In this paper, we present Anne O’Tate, a web-based tool that processes articles retrieved from PubMed and displays multiple aspects of the articles to the user, according to pre-defined categories such as the “most important” words found in titles or abstracts; topics; journals; authors; publication years; and affiliations. Clicking on a given item opens a new window that displays all papers that contain that item. One can navigate by drilling down through the categories progressively, e.g., one can first restrict the articles according to author name and then restrict that subset by affiliation. Alternatively, one can expand small sets of articles to display the most closely related articles. We also implemented a novel cluster-by-topic method that generates a concise set of topics covering most of the retrieved articles.

(Hat tip: ResourceShelf)

You can find other posts about 3rd Party PubMed/MEDLINE tools here.

More details: NLM Browser Toolbar

I posted earlier this month about a browser toolbar built to make use of NLM resources.

At the time I was impressed with Guus van den Brekel for having discovered it. It turns out that I have better reason to be impressed. Guus built it.

I finally got around to trying it…and I like it. I’m not sure that the “Other PubMeds” feature is really worth the real estate it takes up (I’d rather it linked directly to particularly useful 3rd-party PubMed Tools rather than to sites about them), but that’s a fairly minor complaint.

Conduit appears to have a good reputation and I absolutely trust Guus- so I’d say it is worth trying whether or not it has been endorsed by the NLM. If you have suggestions about how it could be improved, please share them with Guus.

JANE (Journal/Author Name Estimator)

Have you recently written a paper, but you’re not sure to which journal you should submit it? Or are you an editor, and do you need to find reviewers for a particular paper? Jane can help!

Just enter the title and/or abstract of the paper in the box, and click on ‘Find journals’ or ‘Find authors’. Jane will then compare your document to millions of documents in Medline to find the best matching journals or authors.

http://biosemantics.org/jane/

More info:
Martijn J. Schuemie and Jan A. Kors, Jane: Suggesting Journals, Finding Experts, Bioinformatics, January 28, 2008.

[via]

Other posts about third-party PubMed/MEDLINE tools

NLM Browser Toolbar

http://nlm.ourtoolbar.com/

This appears to be a Conduit toolbar…but I see no proof that the NLM is actually behind it.

If you install it and use it, leave a comment to let me know what you think of it?

[Via Guus van den Brekel]