Computational biology and more...

Category: Science

Posts relating scientific topics, particularly computational biology.

Publishing Research… Part 2

Sales and Marketing in Research: About being a “Booth Babe”, Performing Arts and Science

I wrote Part 1 over a year ago!  It highlighted one example, where in my (completely unbiased 🙂 ) view, four editors got it wrong.  Of course, this is not unique in science, or indeed to science.   J.K. Rowling, author of the Harry Potter books and one of the top 10 selling authors of all time, was rejected multiple times by publishers who could not see the appeal of her stories.  She has also famously shared rejection letters for her first book written under the pseudonym Robert Galbraith.  History is littered with people making decisions which in the light of history seem like obvious oversights.

Back in March 2016 I spent three days with Dr Suzanne Duce, promoting our free research software Jalview (www.jalview.org) and some of our other resources such as JPred (www.compbio.dundee.ac.uk/jpred) at the Microbiology Society Annual Conference in Liverpool.  You can see a cute (?) timelapse of the stand here.

This was a bit of experiment to see if we could reach new audiences for our tools. The Wellcome Trust (wellcome.ac.uk) and BBSRC (www.bbsrc.ac.uk ) both fund our projects and gave us some money to do this sort of thing in our last grant renewal.  Did it work?  Well, yes!  We met a lot of people who were not using our tools but probably should be at least trying them, we reconnected with some users of the tools and introduced them to some new features, and we set up some new collaborations.   Was it worth me spending three days on a stand? Maybe yes.  It was a good experience and I just transplanted my office there for the duration so could get on with other stuff too!  Maybe no, but more of that later.

One of my former group members who has been in the USA for a lot of years called what I did in Liverpool as being a “Booth Babe”.  This conjures up all kinds of horrible stereotypical and sexist images, but made me think more broadly about what we do in science and how much what we do centres around Sales and Marketing.

Publishing Research and the Importance of Marketing

In the world of musical performance, competition is intense, so success is a combination of:

  1. Talent
  2. Marketing (getting noticed and continuing to get noticed)
  3. Timeliness (having the right sound or stage presence)
  4. Perseverance (coping with failure and not giving up)
  5. Being attractive, unusual and/or flamboyant

All can  help in promoting your “brand”.  There are very many people with the talent to be a top performer, but few gain international success and acclaim.   The most successful usually combine 1-5 in varying degrees.

I don’t think the situation is any different in science.  We scientists would like to think it is all about the purity of the research, but that is only part of what makes up a successful career.  If we take 1-5 above and map them onto a science career:

Talent.

Well, you need to be able to “do” science.  What that means is field dependent, but you identify problems interesting to study, design experiments, methods or observation strategies, carry them out, interpret the results, put them in context and so on.

Marketing

It doesn’t matter how smart your research is, if you don’t tell anyone about it, it is invisible to the world.  A bit like a musician playing awesome guitar riffs in their bedroom but never performing them to anyone else.  In science, you need writing and other communication skills.   The biggest way you “sell” science is through the peer-reviewed literature, so publishing your excellent research in journals is key to success.  However, where should you publish?

There is a massive preoccupation with publishing “well”. Publishing the results of your research in “high profile” journals.  Why is this?  On the face of it, it should not matter where you publish if the research is sound, but it certainly does make a difference.  In my view, it is all about marketing.

If you publish in a little-known journal then it may take months or years before anyone notices, even if that research is paradigm shifting.  At best, the people who should read your research won’t notice it, at worst they will just dismiss it because (a) they have never heard of you and/or (b) think it is not even worth reading due to where it was published.  If the journal is behind a paywall, then the chances of your work being noticed are even more remote.  If it is open access then at least it is visible to all, but (a) and (b) will likely play a part in it getting ignored.

It is a sad fact that if you publish in “high profile” journals like Nature or Science, then your work will be noticed, even if it is not the best in the field.  Most scientists look at these journals, journalists look at these journals, your work will reach a wide audience.  You will be more likely to get invitations to speak at conferences, your CV is more likely to get noticed when you apply for jobs and so your career is more likely to take off.  Sigh…

Timeliness

It doesn’t matter how exciting you think your research is, if you are just doing some incremental additions to a well understood problem, then even if it is really important, it won’t get a lot of attention.  If your work is a long way ahead of its time, then the risk is no one else will understand why it matters!  If you are really on top of your work then you will be ahead of the world, that is the point of research, but unfortunately, you will always be the world expert on what you do so explaining the magnitude and importance of your discovery or technical innovation to others can be hard.   This is where your marketing and political skills come in.   Sadly, however much you try to explain something to your senior or more established colleagues they may still not “get it”!

Perseverence

You need a lot of this to be successful in science.   Ideas don’t always pan out, experiments fail relentlessly, bugs creep into code only to reveal themselves late in the publication process (OK, that wasn’t a big bug, but…), good papers don’t get sent to review ( 😉 ), grants get turned down… You have to be able to cope with this all and have confidence in what you are doing to keep trying!

Being Attractive and/or Flamboyant

I’d really hope that how you look is not important to scientific success. However, if you can communicate well and get along with other people then you will be more likely to build up your network of scientific colleagues.  The thing is to be confident about what you have done and what you know about and not be afraid to defend your point of view.  If you are a naturally gregarious person and have a flamboyant style, it probably helps get attention of peers and beyond.  Of course, having a great delivery style has to be backed up by solid science…

The changing face of publication

When I started my scientific career in the 1980s, the only methods to advertise research were through journal publications and presentations or discussions at other institutions and conferences.  These remain major and important ways to advertise and disseminate research.   Preprint servers such as arXiv and bioRxiv make getting your original research seen early, much easier.

In the early 1990s http came along and so we could advertise and promote research through a web site.  Here is a copy of mine from 1996, though it looked much the same in 1993/4 when we first created it.

Today, there are many ways to disseminate research and draw attention to your work.  This blog is one, some entries focus on specific research we have done, others like this are more general.

One has to be careful reading blogs, including this one!  Some scientists use their blog as a platform to attack other scientists since this is difficult to do in a conventional peer-reviewed publication.   I don’t think this is the most productive way to settle differences of opinion.   Of course, being controversial in a blog, draws attention to yourself, but personally, I would rather not write something that deeply upset another person, however strongly I felt about the topic.  If anyone reads my blog or anything else I have written and are upset, then please let me know, so I can learn why.

Twitter (in my case: @gjbarton and @bartongrp) is great for advertising new work, new job opportunities etc., but relies on building up a follower network.   I’m not sure how effective it has been in finding people for me, it is hard to judge, but I know some who have found jobs through twitter ads and exchanges.   Direct messaging on twitter is also invaluable as a communication method with fellow scientists.

Facebook can be useful too, we use this to promote Jalview, though since I manage the page, I have to remember to update it!  Facebook is an easy platform to use to advertise basic news etc.   Of course, there are many other platforms that you can publish on:  LinkedIn, ResearchGate, etc., etc., …  it is just a question of spending the time keeping them up to date.

Social media, blogs etc  may or may not help improve your scientific profile though, which is still built primarily on high-quality research written up clearly in the scientific literature!   With the scientific literature, there are now many experiments with different publishing models – open peer review, peer review after publication etc., but change is difficult and slow when the majority of science is still published by conventional journals.

As with musicians, scientists must balance advertising and promotion of what you have already done, with doing the next new thing.   I am demonstrating by writing this blog rather than my next grant application that it can be easy to get distracted from the “doing new science” thing…

Finally, was the booth at Liverpool worthwhile?

As I said above, overall yes, some new contacts were made that have led on to new collaborations and also opportunities to teach about Jalview at other institutes.  The caveat is that we reached around 200 people at that meeting.  In contrast, Dr Suzanne Duce who manned the stand with me, has prepared >20 instructional videos about Jalview as well as other videos about what we do (even some of me giving some lectures…   https://www.youtube.com/playlist?list=PLpU3VZmUmrT0r4M-ixmuzEpboy5Wfxkws).  These videos have reached 10s of thousands of people world-wide.   A conclusion would be that if time is limited, then making videos and using social media/web promotion has to be more efficient and effective than talking to people individually. However, one cannot beat face-to-face meetings for explaining complex concepts.  Face-to-face gives the opportunity for questions and discussion, particularly in a small group and so while reaching a smaller audience, it provides a richer way to communicate.  Overall, we have to do a bit of everything and adapt as new technologies and opportunities present themselves

Thoughts on age in science…

I saw a twitter discussion last year from a young Professor in the USA who was saying something along the lines of “why do people not realise I am a Professor?”  I think she was about 30 years old.   She commented that she was tired of hearing people, mostly men, say to her, “You don’t look like a Professor” or you “Look too young to be a Professor”.   This was a thread about discrimination in the University workplace against non-white middle-aged males.   Reading about her concerns reminded me of my own experiences as a young group leader.

When I was 17 I really did not know what to do.  I was privileged to have the opportunity to go to University, but did not know what subject, or even what University was all about since no one in my family had ever had that experience.   In the end, I went to study mechanical engineering, after a year switched to biochemistry and then finally began to enjoy myself when I started a Ph.D. in computational biology.  Thanks to excellent mentoring and a fair bit of luck, 5 years later at the very young age of 28 I was awarded a Fellowship that allowed me to start my own independent research.  A year later, I was sole supervisor to a bright new Ph.D. student…  Oh, I also taught undergraduate students a bit too.

So, I was very young to be a group leader.   This was unusual then and still is today, but I was certainly not unique in gaining independence at a young age.    One consequence though, was that until I was about 36 or so, I was frequently mistaken for a Ph.D. student or post-doc.  I lost count of the number of times people asked me “Who do you work with?” or “Whose group are you in?” and I had to explain patiently that I had my own group.   This was particularly true when I travelled to give talks outside the UK where it was even more unusual for someone of 30 to be running a group.  I remember visiting the NIH at Bethesda when I was about 31 and having a 1:1 discussion with a postdoc.  I mentioned that my Ph.D. student was working on something – he looked at me amazed and said: “You have a Ph.D. student???!”.  On that same trip, I had various senior staff saying: “So, you are here looking for a postdoc position then?”.   On another similar occasion, I was visiting a University in Australia.  I’d given a talk and afterwards was doing the rounds of scientists, people thought I would be interesting to meet or vice versa.   When I met one person he said something along the lines of: “Sigh… you are another one looking for a job here then?”.  I got the impression he thought I was the latest in a long line of job hunters he had been asked to talk to.

At the University I worked in at the time, I was made a member of Congregation, which is the governing body of the University.  Not such a big deal, this group had all the academic staff in it but I had had to be nominated since my salary was externally funded.  I recall getting a letter about new ID cards being issued and being instructed to go to a particular office to get it done.  I turned up at this office in an ancient building and told the lady at the desk why I was there.  She rather brusquely told me I was in the wrong place and should go to the Student help centre!  I was a bit put out by this, but showed her the letter I had that explained I should go to this office!  She glanced at it and said: “Oh!  You are a member of Congregation, so sorry Sir…”   I think that is the one and only time I have had someone in the UK call me “Sir”.

So, it was a pretty common experience for me to be mistaken as a student or postdoc even after several years as an independent scientist, what would be called a “Professor” in some countries.  I don’t recall being offended by it, though the guy in Australia did irk me a little with his attitude, he softened once I explained I had a job and was just interested in his research.    I don’t think I was discriminated against on age.  I did see other kinds of discrimination based on where I went to school, but by and large I worked in those early years in an institution that judged people by what they could do rather than what they looked like or how they behaved.

I know as a white male, I have not had to deal with the biases (unconscious or otherwise) that other genders or races have to deal with in many institutions.    Despite this, I have noticed a difference in attitude to me and what I say as I have got older, especially  now I am a grey, bald, Professor (in the British sense) and so fit the appearance stereotype.

Publishing Research… Part 1

In 2015, for the first time in my 30-year scientific career, I had a paper rejected by four journals without even being sent to review.  Of course, I am used to journals reviewing then deciding it is not for them – that does happen, but not to even send to experts?!  I was very surprised by this, particularly since I had been speaking about the work described in the paper for almost two years at UK and international meetings and the talks had always drawn a large and interested audience.  The paper being thrown out by editors without consulting experts in the field was the second in our series about a 48-replicate RNA-seq experiment which I have described earlier in this blog and was the result of a collaboration with fantastic experimentalists (Mark Blaxter at Edinburgh, and Tom Owen-Hughes and Gordon Simpson at Dundee).  Thankfully, the fifth journal, RNA, sent it to review, and it was published on 16th May 2016.  (Schurch, et al, 2016, RNA).

It was great to get the paper published in a good journal and especially nice to see that it stayed at No. 1 or No. 2 in the top accessed papers on the RNA website for six months (Figure 1 shows a snapshot – I did check this every week!).

screen-shot-2016-06-16-at-12-46-27

This week (1st March 2017), I was doing some citation analysis and noticed to my surprise and pleasure that the RNA paper was rated by ISI Web of Science as “Hot” (Figure 2, while Google Scholar says it has 28 citations – I like Google Scholar!! ).

screen-shot-2017-02-26-at-16-43-04

This means it is in the top 0.1% cited papers in the field of biochemistry or molecular biology for the last two years!  In fact, there are currently only 28 “Hot” papers in this field from the whole UK.  The work also generated the most interest on social media of any of the papers I have had a significant role in, with a current Altmetric score of 177 (Figure 3).

screen-shot-2017-03-05-at-17-29-51

So, by any measure, this is a paper that was timely and continues to have impact!

The question then is: “Why did four journals consider it too boring to review?”.

I am not sure why eLife, Genome Biology, Genome Research and NAR would not review it, but will speculate a bit.  I won’t quote the editor’s form rejections, but they all had a similar flavour that we have seen on other occasions: “Not of general interest.”, “Too technical.”  “Appropriate for a more specialised journal…”  To be fair to eLife, the original title and abstract for the paper were perhaps less “biologist friendly” than they could have been, but we fixed that before Genome Biology, Genome Research and NAR.  To be fair to NAR, the editor was interested but did not think the paper fitted any of their categories of paper.

None of the editors said this explicitly but I did wonder if: “Not of general interest” was code for “not in human…”.   Perhaps they thought our findings might not be relevant to a Eukaryote with a complex transcriptome?  We also worried a bit about this, but our recent paper in Arabidopsis (Froussios et al, 2017; OK, not human, but certainly a  Eukaryote with a complex transcriptome!) shows essentially the same result, albeit on fewer replicates.   Another factor may have been due to us publishing the manuscript on the arXiv preprint server at the point of submission in 2015.  Although all the journals say that they are happy with preprint submission, I wonder how happy some editors are with this?  Of course, it may just be that they say this to everyone and so only respond to the authors who complain the loudest?  I hope that is not the case.

Although it was annoying and time consuming at the time to have the paper rejected four times without review, I’m obviously very happy that the big, cross-institutional team that did this work was right about its importance!  Lots of people from diverse fields have made positive comments to members of the team about how useful the paper is in their experimental design decisions.  It is certainly good to feel that some of your work is appreciated!

I do wonder though if always aiming to publish in “good” journals is the right thing to do?  Surely, it is more important that our work is visible and can be used by other scientists as early as possible?  I will explore this some more in Part 2 of this blog, later…

How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?

On 28th March, the main paper in our study of a 48 replicate RNA-seq experiment designed to evaluate RNA-seq Differential Gene Expression (DGE) methods was published on line by the journal RNA   (Schurch et al, 2016, RNA).    I’ve discussed this experiment in more detail in previous posts (48 Replicate RNA-seq experiment and 48 Rep RNA-Seq experiment: Part II )  so please see them for background information.  Those who have been following the story on twitter since then will know that we made a mistake in a couple of the figure panels in that paper and this affected the conclusions about the False Discovery Rate (FDR) of one of the 11 methods we examined (DESeq2).  Thanks to great help from Mike Love (the developer of DESeq2) and the fact we had made all the code and data available, we together identified the problem and have now corrected both the figures and the text in the manuscript to reflect the correct results.  We caught this error early enough so that the online (and print !) versions of the manuscript will be corrected.  Unfortunately, the revised version won’t be on-line until 16th May so I’m putting the two most important revised figures here to avoid confusion about the performance of DESeq2 which we now find is at least as good as DESeq and the other methods that performed well in our test.

Revised Figure 2:  Schurch et al, 2016, RNAfigure_02_compare_tools_combined_DEseq2_DEGSeq_corrected

Revised Figure 4: Schurch et al, 2016, RNA

figure_04_null_all_DEseq2_DEGSeq_corrected

In case you were wondering, the mistake was caused by a combination of “features” in two packages we used, SQLite and perl PDL.  DESeq2 by default pre-filters the data before calling differential expression.  As a result it outputs floating point numbers and “NA”s.  NA has a special meaning in the language R. We stored the output in an SQLite database which happily puts NAs and numbers in the same field. We then read the SQLite database with a perl program and the PDL library in order to make the plots.  This silently converted NAs to zeros which is not what you want to happen!  The effect was to call some genes as highly significant when in fact they were not called this way by DESeq2.  This bug also influenced the results for DEGseq but did not affect our overall conclusions about that method.  The bug has been fixed in the version of the code now on GitHub (https://github.com/bartongroup).

Although the team and I are embarrassed that we made a mistake when we had tried very hard to do this big study as carefully as possible, we’ve found the last few weeks a very positive experience!  Mistakes often happen in complex research but they can go unnoticed for a long time.   Here, the turn-round time was a few days on social media and a little over a month in the conventional literature.  I think this illustrates how early and open access to manuscripts, code and data can help identify problems quickly and lead to better (or at least more reliable) science!

I’ll write in a separate post about the publication process and add my views on how I think it could be improved, but for now I just want to thank everyone involved in fixing this problem.   Firstly, Mike Love who has been very professional throughout.  He emailed as soon as he spotted the anomaly and then worked with us over a few days to identify whether it was real or an error.  Thanks Mike!  Secondly, the staff at the journal RNA, in particular the production editor Marie Cotter who responded very quickly and helpfully when we told her we needed to change something in the manuscript.  Finally, the entire team in Dundee who worked on this project to find the bug and revise the m/s in good time.  Special thanks to Nick Schurch who somehow managed to juggle his small children and other family commitments to make it all happen quickly!

48 Rep RNA-Seq experiment: Part II

Summary

Earlier this week we posted the first paper in a series about a 48 Replicate RNA-seq experiment (Gierliński et al, 2015). Today, the second paper appeared on arXiv (Schurch et al, 2015).  Both papers are now in print: (Gierlinski et al, 2015; Schurch et al, 2016).

The main questions we were aiming to answer in this work when we started it over 2 years ago were, for RNA-seq experiments that study differential gene expression (DGE):

  1. How many replicates should we do?
  2. Which of the growing number of statistical analysis methods should we use?
  3. Are the assumptions made by any of the methods in (2) correct?
  4. How useful are spike-ins to normalise for concerted shifts in expression?

Paper I (Gierlinski et al, 2015), addressed Point 3 in the list. Our second paper looks in detail at points 1 and 2. The high number of replicates in our experiment allowed us to see how variable results would be if we had fewer replicates. For example, we took 100 sets of 3 replicates at a time to see the variance (uncertainty) in an experiment with only 3 replicates. We did the same thing for 4 replicates and so on up to 40 replicates. In effect, the sampling over all the different DGE methods we did was like performing over 40,000 RNA-seq experiments!

The Abstract of the paper, Figures and Tables give a summary of the conclusions, so I won’t repeat them here, but since it is quite unusual to do 48 replicates (Well to our knowledge no one has done this before!) I thought I would briefly summarise why we did it and the kind of lessons we learned from the experiment and its analysis.

Background

My group’s core interests were originally in studying the relationship between protein sequence, structure and function.   We still develop and apply techniques and tools in this area such as Jalview, JPred and other more specialised predictive tools (see: www.compbio.dundee.ac.uk). In around 2007 though, we did our first analysis of NGS sequencing data (Cole et al, 2009) in collaboration with wet-lab colleagues here in Dundee. This led us into lots of collaborations on the design and analysis of NGS experiments, in particular experiments to determine changes in gene expression given various experimental and biological stimuli. Since we are in a big molecular/cell biology research centre, our experience spans a wide range of species, biological questions and experiment types.

To begin with we looked at differential gene expression (DGE) by Direct RNA Sequencing (Helicos biotechnology, now seqLL) which eventually led to some publications (e.g. Sherstnev et al, 2012; Duc et al, 2013; Cole et al, 2014; Schurch et al, 2014) using that technique, but later we turned to what has become the “standard” for DGE: Illumina RNA-seq. Irrespective of the technology, we kept facing the same questions:

  1. How many replicates should we do?
  2. Which of the growing number of statistical analysis methods should we use?
  3. Are the assumptions made by any of the methods in (2) correct?
  4. How do you deal with concerted shifts in expression (i.e. when a large proportion of genes are affected – most DGE methods normalise these away…)

We wanted clear answers to these questions, because without good experimental design, the interpretation of the results becomes difficult or impossible. Our thinking was (and still is) that if we get good data from a sufficiently powered experiment, then the interpretation would be much easier than if we were scrabbling around trying to figure out if a change in gene expression is real or an artefact. Of course, we also wanted to know which of the plethora of DGE analysis methods should we use? When we tried running more than one, we often got different answers!

The Joy of Benchmarking ?

2-3 years ago when we were worrying about these questions, there was no clear guidance in the literature or from talking to others with experience of DGE, so when Nick Schurch and others in the group came to me with the idea of designing an experiment specifically to evaluate DGE methods, it seemed timely and a good idea! Indeed, most of the group said: “How hard can it be??”

My group has done a lot of benchmarking over the years (mainly in the area of sequence alignment and protein structure prediction) so I know it is always difficult to do benchmarking. Indeed, I hate benchmarking, important though it is, because no benchmark is perfect and you are often making some kind of judgement about the work of others. As a result you want to be as sure as you can possibly be that you have not messed up. As a developer of methods myself, I don’t want to be the one who says Method X is better than Method Y unless I am confident that that we are doing the test as well as we can. As a consequence, I think the care you have to take in benchmarking is even greater than the normal care you take in any experiment and so benchmarking always takes much longer to do than anyone can predict!  Having said all that, I think in this study we have done as good a job as is reasonably possible – hopefully you will agree!

Collaboration

We don’t have a wet-lab ourselves, but we have a lot of collaborators who do, so the work was a very close collaboration between ourselves and three other groups. The experimental design was the result of discussions between the four groups, but Tom Owen-Hughes’ group selected the mutant, grew the yeast and isolated the RNA while Mark Blaxter’s group at Edinburgh Genomics, did the sequencing and “my” group did the data analysis. With the possible exception of growing the yeast and extracting the RNA, no aspect of this study was straightforward!

We settled on 48 reps since after doing some simulations, we thought this would be enough to model the effect of replicates without being prohibitively expensive. Mmmm, it was still quite an expensive experiment…

Why not other species?

Originally, we planned to do this experiment in multiple species, but while we had collaborators in Arabidopsis, C.elegans and mouse, it was Tom’s yeast team that were first with RNA (within a week of agreeing to do it!) so since the other groups were still planning, we decided to do an initial analysis in yeast and see what that told us. That initial analysis started in March 2013 and we presented our preliminary findings at the UK Genome Sciences meeting in Nottingham in October that year. It has taken us over a year to get the papers written since everyone in the collaboration is working on other projects as their “main” activity!

What is next?

Early on, we decided to include RNA spike-ins in the experiment. These are known concentrations of RNAs that are added to the experiment to provide a calibration marker. This was a good idea, but it made the lab work and sequencing more complex to optimise. It also confused us a lot in the early stages of the analysis, so we had to do another, smaller-scale RNA-seq experiment to work out what was going on. This will be covered in detail in Paper III since we learned a lot that I hope will be of use/interest to others in the field.

If, after reading the paper you have comments or questions, then we’ll all be happy to hear from you!

48 Replicate RNA-seq experiment

48 Replicate RNA-seq experiment

I and other members of the team have talked about this work at meetings over the last 18 months, but today the first of three (hopefully four) papers about a 48 biological-replicate RNA-seq experiment from my group (www.compbio.dundee.ac.uk),  the Data Analysis Group (www.compbio.dundee.ac.uk/dag.html), and collaborators Tom Owen-Hughes, Gordon Simpson (http://bit.ly/1JobrGZ)  and Mark Blaxter (http://bit.ly/1GXtC8M) was submitted to a journal and posted on arXiv (http://arxiv.org/abs/1505.00588). The data generated for this experiment has also been submitted to ENA and should be released in the next few hours.

Clearly, referees will have things to say about our manuscript, but I thought it was worth writing a brief summary here of the justification for doing this work and to provide somewhere for open discussion.

Briefly:

Paper I: The paper submitted today, deals with the statistical models used in Differential Gene Expression (DGE) software such as edgeR and DESeq as well as the effect of “bad” replicates on these models.

Paper II: Will be on arXiv in the next day or so, and benchmarks the most popular DGE methods with respect to replicate number. This paper leads to a set of recommendations for experimental design.

Paper III: Is in preparation, but examines the benefits of ERCC RNA spike-ins to determine concerted shifts in expression in RNA-seq experiments as well as estimating the precision of RNA-seq experiments. There will be an R-package accompanying this paper.

The main questions we were aiming to answer in this work when we started it 2 years ago were:

  1. How many replicates should we do?
  2. Which of the growing number of statistical analysis methods should we use?
  3. Are the assumptions made by any of the methods in (2) correct?
  4. How useful are spike-ins to normalise for concerted shifts in expression?

The aim with the experimental design was to control for as many variables as possible (batch and lane effects and so on) to ensure that we were really looking at differences between DGE methods and not being confused by variation introduced elsewhere in the experiment. This careful design was the result of close collaboration between us, (a dry-lab computational biology group), Tom Owen-Hughes’ yeast lab at Dundee, and Mark Blaxter’s sequencing centre at Edinburgh.

This experiment is probably the highest replicate RNA-seq experiment to date and one of the deepest. I hope that the careful design means that in addition to our own analysis, the data will be useful to others who are interested in RNA-seq DGE methods development as well as the wider yeast community.

How are scientists assessed?

I put this blog together after having a twitter exchange with Mick Watson and reading his excellent piece about how not to sack your professors.

Scientists are assessed primarily by the quality of their research output.  A secondary assessment, but one which is tightly linked to publication quality is the  ability to generate research income.  I will come to that later…

Research output is primarily regarded as articles in peer reviewed publications, usually in scientific journals, though there are other methods of publishing.  In order to be published in a journal, an article must pass “peer-review” by multiple scientists who are expert in the same field as the submitted work.   The whole process of getting from a first manuscript to a final published article can be lengthy, involve multiple revisions and often require the author to go and do more research to satisfy requests from the peer-reviewers (often called “referees”).   I’ve explained this whole process elsewhere, but will  repeat it in a later post here.

However, what constitutes “quality”? In general this means being published in high-impact journals. A high-impact journal is one that is read by a lot of people and so includes a lot of articles that are cited by other articles. One measure of journal quality is to look at its impact factor. This is a number that reflects the number of citations that the journal receives. The simple view is that scientists that publish in journals that have high-impact are doing research that is widely respected. If you only publish in obscure, little read journals then your work is less regarded and so you are not such a good scientist. Unfortunately, this is a very simplistic view since some subject areas are not as trendy as others and so are less likely to appeal to high impact journals like Nature and Science. A further simplistic way to assess scientists is to count their total citations – how often do people cite their papers? If you work in a popular field, your citations are likely to be higher than if you work in a subject area that is less popular. This doesn’t make your work any the less important, or your quality as a scientist smaller, but a pure numerical measure of quality based on citations might be unfair unless carefully normalised against citations within your field.  These factors make assessing quality in a scientists output very difficult and how to do it best is a subject of continual debate.

In the UK every 5 years, there is a Research Assessment Exercise (RAE) for Higher Education Institutions (HEIs). The most recent RAE exercise (now called REF for Research Excellence Framework) published results at the end of 2014.   The RAE/REF process aims to assess all academics (not just scientists) within the context of their field and so give a fairer estimate of quality.  In the latest REF, each academic had to submit four “outputs” for assessment. For scientists, these were typically research papers.  The “outputs” were assessed by a panel of peers and graded, then the results published as departmental summaries, so departments never know how well outputs from individual academics rated.   The RAE/REF is important in the UK since the results directly affect the funding given by central government to individual departments over the next 5-year period

Like any household, UK Universities have to work within a budget.  They must spend less than their income in order to break even and keep some surplus for future investment in people, buildings and so on.  Income is critical to keeping the lights on and keeping everyone paid.  The income is not fixed, but depends on a number of factors such as:

  1. Number of students and where they come from.  All students generate fees income, but international students pay more than home students.  As a consequence, most institutions actively market themselves across the world in order to raise the proportion of non-EU students.
  2. Research grant income.  Academics who win research grants bring income to the university, not only to pay the salaries of the staff on the grants and to buy equipment and consumables, but also in overhead money on the grant.  The overhead money is what keeps the lights on and pays for infrastructure costs and essential administration.
  3. Funds from industry/licensing.  This can be very lucrative since industry funding normally attracts a bigger overhead.  If an institution develops something with high commercial value then this can add significantly to the university income.
  4. Funding directly from government through the REF.  After grading departments every 5-years, the size of this funding component is linked to the ranking of the department.  The formula for this differs in each 5-year cycle.

So to increase income, institutions can choose to attract more high-paying students, but to do this they have to invest in good teaching staff and student facilities.  They can choose to attract researchers who are successful at publishing “high-quality” papers  and hence usually also successful at winning grants and thus bringing in overhead income.  They can also attract staff who have strong industry connections.  Most institutions do a bit of all of this.  On the research side, there is a bit of a frenzy of recruitment coming up to a REF year as departments aim to hire research academics who will provide high quality outputs for the REF and so not only be able to fund their own research from external sources, but also boost the likely income from central Govt through the REF formula.

Few institutions have large reserves of cash they can afford to dish out to all their staff to help with their research, but have to think strategically in order to develop new areas with the limited uncommitted funds they have.  Should they fund Ph.D. studentships to help newly appointed young Academics get their work going?  Should they use the funds to support some core equipment that needs replacing or updating?  Perhaps some building work needs doing to allow a new star scientist to install their expensive equipment?  Or perhaps a longstanding and successful member of staff has just failed to get their core programme grant renewed and needs some bridging funding while they try again in the next round or with a different funding agency?

It can be difficult for department heads to make decisions about who they support and how.  It is clearly complicated and a simple “metrics” based approach is not going to be fair or work effectively for every individual.  While there have been some reports in the media about seemingly arbitrary income-based decisions leading to sackings or even suicide, I find it hard to believe that these tough and tragic events are ever due to a single factor.   The ability to generate research grant income is clearly an essential part of being a scientist since without funding you can’t do anything.  It is also true that few universities can afford to fund research programmes in full from their own resources.  What they can do, like the institution I work at,  is provide the environment that supports their staff to be as successful as they can in the current, highly competitive world of academic science research.

Scientists and the Public

I thought I would post about how scientists get their funding.  I’ve written about this before, so have edited a section of another document into this post at the end.  I’m writing now because yesterday we had a visit to our shiny new building by Scottish comedian and broadcaster Fred MacAuley so I had a few minutes to explain what Computational Biology was and have a bit of a discussion with him and his group of friends.   It might seem odd that we had a visit from a prominent media-person, but Fred is an

Fred MacAulay and Geoff Barton

Fred MacAulay and friends learning about Computational Biology from Geoff Barton

alumnus of our University and a former Rector of the University.  Rectors are usually media-folk (the current one is Brian Cox, the Hollywood actor who generally plays bad guys in the big movies like X-men) and their high profile helps to promote the University in interesting ways.  Anyway, the point is that Fred was with a group of friends, all also Alumni of the University who were celebrating 40 years since they graduated.  Fred had missed the grand opening back in October of our new building so asked if he could visit for a tour.    None of the group had a science background, so it was a perfect audience for my six minute “what we do in Computational Biology and why” talk with an overview of some of the highlights from our Research Division.

I really enjoy talking to non-scientists about what we do.  Every time I do it, I am reminded of what a privileged and “small” world we work in.  Most people have no real concept of what scientists actually do, what motivates them or even how their work is funded and this always comes out in the kinds of questions I get asked.   I think a couple yesterday were:

“How often do you get a Eureka! moment?”

Well, I said: “All the time, it is just that most of them are wrong!”, or as our Research Dean Julian Blow pointed out, “They might be right, but are only small steps that you get excited about!”.

A trickier question was:

“Who tells you what to do research on?”

I hesitated with this one, but then answered that most people go into academic research and become group leaders because they don’t like to be told what to do.  Certainly true in my case…  The follow up question was then,

“Well, how do you decide what to do?”

This is of course complicated and like most things in life comes down to money…  I didn’t say this at the time, so wanted to expand a bit here.

  1. Ideas are free.  As an academic scientist you probably have lots of ideas.  Well, I hope so.  If not, you are perhaps in the wrong job!
  2. Proving ideas are right is not free.  Even if you are a theoretician who works alone only with pencil and paper,  turning your idea into a solid proof takes your time. Someone has to pay for this since you have to have food to eat and somewhere to live.  If you do experimental science, then it certainly costs to set about exploring your idea since you have to build a team and pay for equipment, consumables and travel.

The following is an edited excerpt from “The UK Academic System:  hierarchy, students, grants, fellowships and all that“.

All research requires people to do it, as well as equipment and consumables, not to mention space and electricity. People need to eat and have somewhere to live, so like to get paid for their work. As a consequence, all research takes money!   So, where does funding for scientific research come from in the UK and how do you go about getting it? As an independent scientist (a PI – Principal Investigator), a lot of your time is spent finding ways to fund your research and maintaining continuity of staff in your research group. There is very little funding in the UK for long-term (i.e. to retirement age) appointments, just about everything is funded on short-term grants from one or more organisations. This presents an interesting and challenging problem for a PI, not to mention his/her staff.

There are three main sources of funding: Government “Research Councils”, Charities and Industry. I will focus on Research Council and Charity funding since this is the most common source and the methods of applying are similar and follow an established pattern. Funding organisations offer different types of grants to support research. They include project grants that might fund a single post-doctoral researcher for three years, some equipment money, laboratory consumables and travel (so they can go to conferences, learn what else is going on in their field and tell people about what they have done) to work on a specific problem. Project grants can be bigger or longer, but 3-years and one post-doc is the norm, at least in biology-related subjects. Longer term funding is also possible and is often referred to as a programme grant. A programme grant may fund several post-doc researchers for 5 years. This allows the PI who holds a programme grant to try more ambitious research and to develop multiple themes in their research portfolio. Most successful PIs will hold multiple grants at any one time and from multiple organisations and will spend a fair proportion of their time juggling funds to enable people coming go the end of contract to keep working until the next grant starts.

So, how do you get a grant? First, you have to have a good idea! Then, you identify the funding agency that is most appropriate to approach. There may be specific calls for proposals in your area, or you may apply in responsive mode.   Funding is not infinite, not all good ideas can be supported, so funding agencies appoint committees that specialise in different areas of science to assess grants and decide which will get funded.  You have to target one of these committees with your application. You then need to write the grant application. This will include a detailed costing for personnel, etc, as well as a detailed scientific case. The scientific case will include relevant background leading up to the proposed research as well as a description of what you are proposing to do. Space is usually limited to 5 pages for a three-year, single post-doc grant, so you have to be concise and clear in what you write. The application will also include sections to describe your scientific track record and previous relevant publications. Once everything is together, you submit the application to the funding agency in time for whatever deadline they work to. There is a lot of skill involved in writing grants – it is different to writing papers for publication. You have to present your past work and planned research in a way that is clear and appealing to someone who may not be an expert in your narrow field. This is a particularly big challenge.

What happens then? First, the office checks that you have included everything you should on the proposal and that your proposal is in the right area for their agency. Then, they send the proposal to up to 10 people for peer review.   Other scientists in your field (often your competitors) read your grant, write comments about the grant application and give it a grading. At the next committee meeting of the committee that your grant application will be assessed, your grant will be one of many, possibly 150 that are considered in 1-2 days by the committee. The committee consists of perhaps 20 people like you who are experts in some relevant area of science, plus the administrative staff of the funding agency and will be chaired by a scientist like you. Each committee member is given a set of grants to speak to and each grant will have two committee members who will speak to it. The committee members will have been sent all the grant applications and the peer-review reports in advance of the meeting and will have carefully read at least the applications that they are speaking to. Bear in mind that each member of the committee will have had to read around 10 grants in detail, so if your grant is not written clearly, they may miss the point of it. Committee members may also read other grants in the set if they have a particular interest in them and time to do it!

All committees work in different ways, but one common procedure is as follows: At the committee meeting, the grants are initially ranked by the scores given by peer-reviewers. The committee quickly reviews low-scoring grants to check that the scores are fair, these grants are then eliminated. Any very high-scoring grants may also be put to one side as almost certain to be recommended for funding. The committee then spends most of its time discussing the rest of the proposals, which normally amounts to 80 or 90% of the proposals submitted!! Discussion goes grant-by-grant. For each grant, the two people who have to speak to the grant take it in turn to summarise the grant and what they think of it given their understanding of the proposal and the comments of the referees. The wider committee then have the opportunity to comment/ask questions and generally discuss the merits of the proposal. At the end of discussion, a score will be assigned to the grant and it will be added to a preliminary ranking of all the grants. This is often done by one of the staff on a spreadsheet that is visible on a large screen. Once all the grants have been discussed and assigned scores, the ranking is re-examined by the committee to see if, now that all grants have been considered, that the ranking given to each grant is fair. Some re-organisation of scores can happen at this stage leading to a final ranking that is put forward. The precise cut-off for funding will vary from committee to committee and from meeting to meeting depending on the amount of money the agency has available to fund grants at that time. However, many good, high-ranked grants do not get funded, simply due to lack of funds. Most scientists get used to some of their very good grants being highly ranked, but not funded.

Is the system fair?   At the committee, anyone who has a conflict of interest with the proposal being discussed has to leave the room while it is discussed. A conflict might be that their own application is being discussed, or that of a colleague at their own institution. The main problem is that most grants are potentially fundable, so the committee has a difficult job ranking them. A key component on the committee is who speaks to your grant. Their opinion can make a grant go up in rank or down.

So, getting funding for your research is tough and competitive, but you can’t do research without funding.  Success in obtaining funding is thus one of the metrics by which individual scientists are judged.  Unfortunately,  with some committees on some days, it can seem a bit of a lottery which grant applications get funded and which don’t but it is hard to see a fairer way of choosing, without injecting more money into the system.

© 2020 Geoff Barton's blog

Theme by Anders NorenUp ↑