Thursday, 7 July 2016

Facebook vs. iRecord - a conundrum

An exchange with a member of the UK Hoverflies Facebook Page these last couple of days prompted me to put some thoughts down on the relative benefits of different approaches to biological recording and the way in which it has developed with the advent of digital photography.

When Stuart Ball and I took on the HRS in 1991 the data was mainly supplied on record cards that had to be entered into a database. The BRC at Monks Wood did that job, whilst the scheme orgnaisers acted as the interface with recorders and checked the data to ensure they made sense. Unfortunately, BRC were not sufficiently well funded to keep pace with the volume of data and a big backlog developed. In the case of the HRS this was approximately 2 cubic metres of cards. In the five years after Stuart and I took the job on, we did the job of data entry and gathering machine-readable data. I think we can say we were the first scheme to do this. That effort generated 375,000 records but was organised so that data management took place in the winter and we were able to concentrate on field work in the summer.

A developing paradigm

Things changed with the advent of digital photography, improved internet access and of course such advances as the WILDGuide which made hoverflies accessible to a much wider audience. Nevertheless, until around 2012 the vast bulk of our activities centred upon traditional interactions with recorders, many of whom had been contributing for 30-40 years and were personally known to us. Over 50% of incoming data came from  fewer than 25 people.

The FB page generates about 25,000 records per year and in addition has helped to develop about a dozen people who now do their own IDs and submit data as spreadsheets - maybe a further 5,000 records. Historically, the HRS attracted around 20-25,000 records a year through traditional routes - not least the maybe 2,000 records a year that I generated myself from fieldwork. That traditional resource is now advancing in age - most of the top 25 recorders (who have contributed about 50% of all records) have been involved for over 20 years and several for over 40 years. In the last few years we have lost 2 and a third is far from well and able to record. That means we have got to grow a new generation. We are doing this through two avenues:

  • Regular training events that Stuart and I run across the length and breadth of the UK (from Lerwick and Kirkwall to Exeter, Studland and West Sussex). We run between 5 and 8 such courses per year, but probably only generate about 1 new recorder like our old guard from about every 50 trainees; so perhaps one per year.
  • Interaction with recorders on the web. Facebook has proven to be exceptionally effective in this respect. FB not only helps with ID skills but it helps to develop the wider recording skills and an understanding of what data are needed. That said, we must also accept that a very high number of contributors who are (were) first and foremost photographers who wanted IDs for their shots. Importantly, the spread of involvement has widened considerably and this has made a huge difference to the recording scheme.

New challenges

Working with photographic recording brings with it a completely new set of challenges, not least expectations. Many contributors happily accept that not all photos can be identified, but occasionally they express frustration. I withdrew from one forum after getting abuse because I was not prepared to put names to most photos of the genus Syrphus, which is far from straightforward, even from specimens. The problem of Syrphus identification crops up again from time-to-time and is frustrating for everybody, especially when it is also one of the most frequently photographed genera. We can only do what is possible, and I'm afraid that there is also an issue of best use of resources.

I take the view that it is unwise to call oneself an 'expert' - one is setting oneself up for a fall. So I tend to use the term specialist and accept that I too have a great deal to learn. The difference between me and the relative novice is that I am acutely aware of the pitfalls, and have probably fallen into a good many holes of my own making! That is how we learn. But, using the term 'expert' , the number available to provide identifications is extremely small - well below 20 across the country and probably fewer than 10 who can make a reliable job of it.

Data harvesting

The big question then arises as to how data should be harvested? Should one extract data directly from FB posts or should one direct contributors to iRecord? I have probably built a rod for my own back by harvesting directly from FB, but I do have sound reasons for doing this:

  • A very substantial number of FB members started either as photographers who wish to know what their subject matter is, or who enjoy sharing their experiences with others who are interested from the perspective of getting a good shot. As such contributing to iRecord or another medium is not their highest priority - we would lose a great deal of data if I did not extract from this site.

  • There are considerable advantages to compiling a dataset that has been checked by a small group of the more reliable specialists. This improves confidence that the data are robust, providing one does not simply discard partially identified records to provide perspective; hence I extract all records.

  • I extract a great deal of additional data that often gets overlooked by recorders: the gender of the animal, morphs, abundance, behaviour and flower visits (not the plant the animal was sitting on). It is a comprehensive dataset.

  • I think the page would be a far less effective resource without the feedback that I manage to post on trends in species abundance or record numbers. If we are to generate a new cohort of recorders (and hopefully replacements for the existing team) then we must educate and mentor people.

  • The impact of FB can be seen from the attached graph (it will be bigger still in 2016 as we are dealing with about 50% more records than 2015.
Figure 1. Numbers of records held within the HRS database, separated according to origin: NBN data are held separate to the main HRS datase

But, what about iRecord?

This was built as part of a wider initiative to increase biological recording activity. It has an admirable objective, but starts from the principle that recording scheme organisers are there to validate records. In theory this is the case, but most RS organisers took the job on many years ago when things were less complicated - they maintained a database, did their own recording and gathered in records from a relatively small cohort of reliable records, most of whom they knew individually. iRecord is very impersonal and photo ID is an art that has to be developed - not everybody is willing to do this and relatively few RS organisers have signed up to iRecord - much to the frustration of the Country Agencies who want the data.

In the first year of iRecord there were 14,000 posts of hoverfly records, 11,000 of which were a single data dump from an LRC whose data we already had. I had to work through the lot to clear them, especially as quite a few had to be shifted from full species to aggregate after splits changed the status of species (lots in Platycheirus). That job took me about 100 hours.

This year the pace has dropped and at the moment there are currently 2,457 records awaiting verification. By the time the autumn arrives I reckon that number will have risen to about 6,000 records. So it is a substantial but manageable job in the winter. But it does frustrate me hugely for a number of reasons:

  • There are quite a few contributors who post a set of photos that are all of different animals that they lump under the same species name - that has to be disentangled.
  • Records often lack detail - when I extract data from this page I also log the gender, flower visits, behaviour etc. Posts on FB often lack this or say 'on rose' when they mean 'sitting on the leaves of a rose bush' and not 'visiting the flower of a rose' - there is a huge difference in the value of such data and as there is interest in pollinators my approach is providing a far more robust dataset.
  • A fair few records are misidentified - there is one regular contributor who rarely achieves 50% correct and seems not to have learned at all in the past 2 years.
  • Where records are not accompanied by photos one gets no real feel for the actual skill of the recorder. This is illustrated by people whose data cover Syrphus - lots of records without photos but the odd one with a photo that clearly cannot be taken to species (e.g. males of poor resolution). At that point one must be wary of the overall quality of the data from that person. These have to be dealt with - iRecord is not a particularly good interactive medium and FB is far better in this respect.

·        The dataset that emerges is a mish-mash of occasional records and records from one or two more advanced recorders, so there is not much chance of advancing the science of recording. It is compounded by problems with individual recorders going back through their diaries and adding records that they submitted to the RS many years ago as a record card and that I have already computerised - so I am doing a lot of repeat work for relatively little return.

Where do we go from here?

We have seen a paradigm shift in the way biological recording works. The internet and digital photography has changed the relationship between recording schemes and contributors. It has brought a plethora of benefits, but has also exposed significant weaknesses in the system. The most worrying weakness is the relatively small number of people with sufficient experience to deal with identification, coupled with raised expectations that they will provide their time in line with demand. Unfortunately, there are limits to what they can do or are willing to do. Some are  not particularly computer literate, and do not have the spare time to respond in line with the immediacy of modern life. Others deal with groups of organisms that require dissection or high magnification and checking numerous characters that are difficult or impossible to depict in photographs. Others still just don't want their lives ruled by a computer: a comment that resonates is 'I like the fieldwork but I don't want to become involved in administration'.

Thus, my view is that we are witnessing a turning point in biological recording. If we want to use interactive media, then we have got to grow capacity to respond to demand. My guess is that the resident specialists on the HRS FB page jointly contribute over 2,000 hours a year to this one medium. It has achieved a huge amount but there have to be limits to what can be done with existing capacity. So, we must grow new capacity - which of course depends upon the same limited cohort of specialists! We will get there I think, but we must also ask for expectations to be tempered:

We won't ever manage to identify everything posted as a photograph, and we probably will never fulfil all the aspirations of data users. Nevertheless, the UK is in a far better position than anywhere else in the World, with the possible exception of The Netherlands where biological recording is also well served by the non-vocational ethos.


  1. This comment has been removed by the author.

  2. As the person whose comments sparked this column, I am torn between the guilt of seeing Roger spend more hours of the night writing this (please look after your health and personal needs, Roger) and hoping that raising some questions has helped to clarify and make explicit some aspects of Roger's philosophy.

    Of course, as an amateur contributor, one can feel disappointed when a set of clear images, which may have taken considerable time to capture, select, crop and post, can only be identified to genus or aggregate level.

    That disappointment/frustration could be mitigated if the message were delivered differently. If one of the key uses of the data is to prove the importance of hoverflies as pollinators, then I presume that the exact species is less important (in that respect).

    If instead of the response being just 'Syrphus sp.', it were 'Please record as Syrphus sp.' or 'Thanks - this will be recorded as Syrphus sp', the observer would still feel they were contributing to scientific knowledge. Otherwise it feels like one has wasted the time and there is no value to the submission.

    I don't think there is any way to have a pre-typed range of responses for use on Facebook (except cut and paste from a separate document) so there would be a little more to type, which multiplied over many records.... With iRecord, there are a a range of verification responses built in and these could be extended if everyone agreed and resource was available.

  3. That is a fair comment Paul, but I'm afraid that the team also has a problem with the amount of time we put in: at the moment it is not unusual to be working to well after midnight and then starting agian before 8 in the morning. Sometimes we have to limit what we say just to keep on top of the workload - which gets to the point of impossibility. There is precious little chance of me or others looking after out health if we are the expected to expand what we do still further. Sorry - I will see what I can do but I won't promise, as it can be a nightmare - one deals with one post only to return to the list and find that three more have arrived - often this continues will into the early hours. It is like trying to sprint for five hours solid.

  4. It seems to me that there is also an issue of the preservation of the data. There is no guarantee that FB records will be accessible or retained in a useable form into the medium to long term future but in contrast iRecord is intended as a long term data archive.

    1. Not so. Each record has been extracted and has been turned into a data line on a spreadsheet that is then uploaded into the HRS database that then goes onto the NBN. The URL is retained within this record, and true the post may disappear from FB but this is no different to Museum beetle eating a specimen or the loved ones of the deceased entomologist putting his/her collection in a skip or on the bonfire (which has happened many times).