Thursday, 17 August 2017

DNA sequencing - the solution to recording problems?


There was a bit of controversy on the UK Hoverflies Facebook page yesterday. Debate about the ethics and/or importance of retaining specimens led to an assertion by one contributor that collecting for recording was an anachronism and that it could be replaced by DNA analysis of a leg taken from a live insect!

The concept provides rich food for thought. Are we at that stage yet? If so, is it or will it be a viable option?

As far as I am aware, we are a long way off having a full database of DNA sequences for many animals and the prospects of assembling such sequences for bigger Orders such as the Diptera are a very long way away. There are initiatives to start the process, but they are fraught with problems; not least that traditional killing agents degrade DNA, so the only viable option is to take fresh specimens and freeze them. That is relatively simple for easily recognised species, but once one enters the realm of difficult taxa it is likely to lead to the need to take and kill very large numbers of individuals to track down the missing pieces. The sheer scale of the job is immense and is not going to be achieved in the near future. It is further complicated because the specimens must be stored in close to pure ethanol – which is not readily available to anyone other than registered labs.

That starts the thread of a bigger problem


Which gene sequences are the most useful for separating particular taxa? There has been a lot of work on the CO1 gene in hoverflies, but this gene is not without its limitations. I suspect there is a lot more to do before we can reliably separate some species using DNA sequencing.

BUT, I think the most worrying complication is the degree to which identification errors are already entering the system. Dipterists in the UK have been shocked by some examples of gene sequencing from other parts of the World, with the authors describing sequences for what are clearly species within a different FAMILY let alone genus! The genie is out of the bottle and it is going to take a fair while to put it back and then release it under control.

What about DNA as a way of recording?


The idea is great. You buy your portanble gene-sequencer and catch insects that go into the sequencer and out pops a record! What happens to the insect? I suspect early sequencers will be fairly invasive and the animal will suffer serious injury or death. The idea of removing a leg from a fly 3mm long whilst keeping the animal alive is going to be dependent upon the dexterity of the operator. I suspect there will be large numbers of maimed and dying insects! Why not the old system of hand lens and holding the insect in ones fingers as specialists do at the moment for moderately doable species?

I suspect what is more likely is that in time it will be possible to put an insect soup into a sequencer and get a long species list of those that can be identified, plus a tail of question marks that cannot be identified and will never be identified because the animal has been liquidised!

What is the way forward?


There is no doubt that there is a need for a major gene sequecing programme, and that existing specialists will need to engage in the process. Many of us have already done so in some capacity. It remains to be seen how fast progress is made, but the days when there is no need for the microscope and pinned specimens are some way off.

Critically, if DNA sequencing is to be anything more than a dream, we need to grow a new generation of taxonomically competent specialists. They will have to provide the technical know-how in terms of reliable species identifications to confirm what gene-sequencing tells us. Traditional taxonomists are likely to be needed for a very long way into the future! The Universities are not doing this. I'm not sure they ever did, really. The skill of the taxonomist is the result of many years' work after graduation: getting to know their subject area in intimate detail. Such skills may once have lain in Universities, but to a large extent they were the territory of museums. Those jobs have largely gone too.

In the UK perhaps as much as 80% of the technical know-how resides in the non-vocational sector (amateurs). We must therefore make sure that taxonomic skills survive until Nirvana is attained. The HRS is doing its bit by running training courses and in its use of the UK Hoverflies Facebook page to mentor new taxonomic specialists. Taxonomic expertise is at a premium and needs to be valued and nurtured if the aspiration of developing a complete DNA sequence library is ever to be achieved.

Take nothing but photographs?


The issue of retaining specimens is one that will never go away. Some believethat one should take nothing but photographs and leave nothing but footprints. This has been the mantra that has existed for a couple of decades, and has become very firmly emdedded. Others, perhaps an 'old school', are more relaxed about retaining specimens of invertebrates; and then there are the specialist taxonomists whose experience points to a continuing need for retaining specimens. Who is right? Or, is there a 'right' and wrong' answer?

BWARS have produced their own policy on specimen retention and rightly point to the need for restraint. They also highlight the dilemma that faces the serious specialist – your subject area is fascinating, and the animals are delightful, so why kill them? In my case, I gave up moths many years ago because I no longer felt that I could justify specimen retention on the grounds that I was not adding much to science and that my collection would not be wanted by a museum. Moreover, I was confident at the time, that with a small number of critical exceptions I could cope with live specimens. Thirty years on, I find I have forgotten everything about moths and they cause me a headache! I've not got the time and energy to go back through the learning process again!

Do we need to retain specimens at all - Where is the evidence?


The view that photography alone will suffice is reinforced because people can now take a photograph and post it on one of the specialist Facebook groups or iSpot. In many cases they will get a name, either complete or partial. Whether the determination is correct is another matter! Unfortunately, there is very little in the peer-reviewed literature that quantifies the issues. I have tried to provide some basic statististics but my patheitic attempts were met with reviewer comments ranging from 'of little scientific importance' to 'grossly misleading and wrong'. One reviewer ranted that at best 10% of hoverflies could be reliably identified from photographs. I gave up trying to produce something to fill one of the gaps!

Yet, I have good data from nearly ten years of extracting records from photographs. Those data now comprise perhaps as many as 100,000 records (approximately 10% of what has been assembled by the Hoverfly Recording Scheme over 40 years). I also have a good run of personal records that have been collected consistently over 30 years (maybe 40,000 records). So some comparison can be made. Similarly, there are now several recorders who are primarily photographers, but who also retain specimens that they send to me for determination. These three models can be compared, although scientific purists would argue that one really needs to compare photographic data with data derived from a rigid trapping protocol.

Are hoverflies a useful model for evaluating the potential of photographic recording?


Hoverflies are one of those 'in-between' groups. Some are relatively straightforward to identify from photographs, providing the photograph is of sufficient resolution to evaluate form and markings. Even so, we occasionally see photographs of relatively straightforward species that cannot be firmly identified. A far greater proportion can be identified on occasions, but unless critical features are well depicted we will struggle to get any further than generic level. There are then the genera that cannot be identified from photographs at all. For example, many male Platycheirus are determined on the basis of pits on the undersides of their feet – those are not depictable in live animal photographs. Some species can only be done from the internal structures within the male genital capsule (e.g. Sphaerophoria). Others are simply fiendishly difficult without access to comparative material (and even then cause problems).

We must also remember that we have a typical 'island fauna' that is a sub-set of a bigger continental fauna. Our 284 species of hoverfly compares with over 800 species in Europe. The fauna's of our near-neighbours in The Netherlands and Belgium are perhaps 20% bigger, even though their land area is much smaller. It makes our job easier, but we also forget that we may well be overlooking cryptic species amongst species that we currently believe to be one 'easily identifiable' species. Eristalis is one potential problem area.

What do the data tell us?


A post on this blog earlier this month provides some indication of the sorts of differences that can be seen when photographic data are compared with data collected by a specialist. The most significant differences was in the relative importance of Cheilosia in the specialist dataset and the much higher representation of Pipizella and Paragus in that dataset.

The overall message is that photography can, and does, generate a large number of valuable records. Photographic recorders also ensure much wider geographical coverage, and will find species that occur at very low densities that are not well represented in the specialist dataset. The data are, however, a sub-set of the overall fauna. 

Does it matter


If you are a naturalist who simply wants to know roughly what the animal or plant you have seen is, then the quality of identification is not a huge issue. It might mean that the 'lister' achieves longer or shorter lists depending upon the level of caution used in coming to a determination.

The issues start when data are used for other purposes such as site safeguard and development of species conservation strategies. If data are skewed then it is easy for developers to undermine the confidence that can be placed on individual records and on the conservation status of species. This has always been a problem for invertebrates and they are still very much a Cinderella area. To the best of my knowledge there remain no SSSI based solely on invertebrates; yet there probably should be. In the days of NCC and English Nature it was an uphill battle to get invertebrates the recognition they deserved. When BAP was developed, a huge list of birds went on as priority species, yet invertebrates that had undergone similar levels of decline were rejected because the data were believed not to be reliable.

Thus, the message has to be, if you want to see invertebrates properly conserved, you need robust data. We just about manage this for hoverflies, but getting similar levels of coverage and detail for, say, fungus gnats or craneflies is impossible. Why? Because they rely on high magnification and often upon characters that cannot be seen in photographs. Perhaps more importantly, because there are a handful of specialists capable of identifying them and those specialists (wisely) will not spend their lives glued to a computer screen identifying photographs.

Wednesday, 9 August 2017

Making the most of records



When biological recording first started, its principal objective was to map the distribution of plants and animals. Atlases became very important and impressed a message that submitting data to a recording scheme was about creating dots on maps. That view continues because we have substantially failed to show what else records can be used for.

The situation is changing and Birdtrack has set the pace with its real-time chart that shows how individual species are occurring in comparison to previous years. This is an approach that is really only possible when schemes get records as they are created. It depends upon high levels of memory on the server and as such is probably beyond the options available to smaller recording schemes. The HRS is moving in that direction as one of the larger schemes, but as we are self-funded the costs are starting to rise and we will need to see what we can do to cover them.

Meanwhile, Stuart is hard at work developing our new site and including lots of nice new features that will bring us a bit closer to the real-time Birdtrack approach. We are a little way off that format but he has got a system working that allows analysis of previous years' data. Hopefully, this package will be rolled out in the not too distant future, but in the meantime here are some examples of the current state of play.

At the moment there are just short of 1 million records on the database. I have just passed over approximately 18,000 records that we have for 2017. Those will be incorporated into the database and the background tables updated in the not too distant future. I've got about another week's work sorting out other data that has been submitted in the past few months, so I suspect the total will be nearer 25,000 when all data are assembled.

I have included maps and phenology plots for four common and readily identifiable species to show what is possible. The maps show that whist we have very good coverage, there are some big gaps, and plenty of areas where the last record was made several decades ago. There is lots that even the novice can do to help change this situation! The phenology plots are really instructive and I think show just what the potential is for future real-time reporting. 

Note: the blue histogram represents all records (all taxa) for dates between 2001 and 2016. The subsequent graph expresses the sixteen year average phenology and the red line is the phenology for 2016 as a proportion of all records receved for the week in question. Thus the proportions for species that occur during the winter go up as the numbers of species recorded declines.

Episyrphus balteatus

Eristalis pertinax

Eristalis tenax
Rhingia campestris






Wednesday, 2 August 2017

Making sense of data

Over the past two years I have take a leaf out of John Bridges' book and have attempted to get out recording every day. It is a tough challenge so I reckon John does pretty well getting out as often as he does. Some months I do better than others, but in July I managed to do something every day; partly because I was walking to the hospital every day for the first 3 weeks.

My routine has been to record everything I see, no matter how common it is. If I enter a new 1km square, or new recording site unit then a new list commences. This, I hope, is not dissimilar to some of the recording by Facebook group members such as Kevin Bandage. Thus, I have attempted to create a complete record for the month that might be used for comparison with the data emerging from the Facebook group. In strictly scientific terms the comparisons are sufficiently different to say that one cannot draw firm conclusions from the data, but they do paint some important pictures.Thus, in Tables 1 & 2, I present my own data and the data extracted from the UK Hoverflies Facebook page, iSpot and Flickr for July.
Table 1. Records at generic level generated by photographic recording in July 2017. The data include a combination of full and partial records comprising a total of 5091 records.

Table 2. Records generated by RM in July 2017 comprising a total of 1758 lines of data
I have included basic counts of numbers of species and gross numbers of records aggregated at Generic level. The only point of departure between what I record and what is recorded from Photographic data is that I don't record female Sphaerophoria. In common with the photographic dataset, I created separate lines for males and females, except where the numbers reached such proportions that it was not possible to count them.

It should be noted that a substantial number of Facebook members maintain their own spreadsheets that are submitted periodically. I have not attempted to do anything with these data as this is simply a very rough analysis. More detailed analysis is needed but will require a lot more work.

The results are pretty informative.
  •  A total of 4231 full records and 860 partial records were generated by photographic recorders. My data yielded 1757 lines for a full ID and one partial ID (a female Eumerus). When you bear in mind that most of the really assiduous recorders contributing to the HRS rarely generate more than 1,000 records in the course of a year, my efforts show what can be done but they are based on a level of effort that cannot be sustained by most recorders and would not have been sustained by me without the enforced period of hospital visiting.
  • It is clear that using a large pool of recorders is an extremely effective way of securing records from a wide range of species, including a significant number of relatively uncommon animals that a single recorder, no matter how diligent, is unlikely to see on a regular basis. Thus the species list for the photographic dataset stands at 101 species; whereas my own list was considerably shorter (70 species).
  • The same obtains at Generic level, with 50 genera reported by photographic recording as opposed to 35 by my own efforts.
  • Geographical coverage within the photographic dataset is country-wide, whereas my own data cover fewer than 5 hectads at locations in Northaptonshire and south London.
  • The ranked frequencies of the genera as represented in the two datasets (Table 3) are substantially different, as illustrated by the genus Cheilosia, which in my dataset lies second in the ranking whilst in the Photographic dataset lies at no 8. Other genera that enjoy a more prominent role in my dataset include Paragus and Pipizella. All three of these genera are difficult/impossible to do from photographs and yet are extremely abundant when recorded systematically.
  • The abundance of some genera such as Platycheirus in the photographic dataset suggests that there may be a weakness in my search techniques for these genera, although I am at a loss to understand why that may be so - not only do I make visual searches, I also sweep suitable vegetation wherever possible (hence the strong representation of Paragus in my data).
  • On this note, I suspect the answer to some of the differences in frequencies probably lies in regional variation in species' abundance. For example, I have been amazed by the numbers of Volucella inanis and V. zonaria in south London this year. Conversely, SE England is always very weak for Leucozona glaucia and L. laternaria; hence their poor showing in my data.

Table 3. Comparative positions of individual genera when organised in rank order within the two datasets.
Thus, what can we say about the data? Well, both systems have their strengths and weaknesses. I suspect that what we really need is a network of recorders who adopt similar techniques to those I employ if we are to establish the sort of contextual data we need to make full use of the photographic dataset and to understand the trends that might be conveyed in both datsets.