Saturday, June 21, 2014

Difficulties of opportunistic field data

It seems that every time I work with field data, both data I collected and data I have been given to analyze, I come across a few common problems. There are usually transcription errors to deal with, which can be easy to handle when I have access to the original data sheets. There are outliers to detect, some sleuthing to be done to determine if "this tree really is 40 meters tall? or 4 meters tall?" And other similar problems, mostly due to little mistakes made here and there. But a problem I've been thinking about recently is how to deal with opportunistic field data.

Since, as far as I know, I just made this term up, let me describe what I mean. Opportunistic data are data collected piecemeal while collecting the data you have planned to collect. Or, they are data that "add information" but are not consistently collected at every site you visit. For example, when I ran my first transects to collect size and reproductive trait data for Glossy buckthorn I noticed that many plants appeared to have been browsed. Based on my reading prior to my field work, I was confident that buckthorn wasn't browsed by the native fauna, so I wasn't planning on collecting this type of information. However, I thought "hey, I should make a record when a plant looks like it's been chewed on". But I didn't make it uniform, I didn't add it to the data sheet as another column (the data sheets were already printed out and didn't have space for an extra column!). I just committed to commenting on it in the "Notes" column. At some point I also started noting whether a plant was damaged or not, also in the "Notes" column. By the time I got to transcribing the data into CSV files, I really couldn't distinguish damage, from browsing, from failure to make a note of either. I was left with data for some plants, and not really knowing if plants that weren't noted were free from damage/ browsing, or unobserved for damage/browsing.

Perhaps the biggest problem I had is that if I started recording a piece of information in the way I described above, I wasn't really as dedicated to it as I was to the data I originally set out to collect. As another example, I have photos of some of my seedling plots, but not the ones that I thought, "oh, there aren't that many here. I'll note the number and that will be fine". This might seem like an action taken out of laziness, but really I think data collection involves constant triage. "I need to measure X number of plants. This extra information is not as important as achieving that goal." In the end, I collected plenty of incomplete data during my field work, mostly as observations jotted down as notes, which never saw any type of formal analysis. However, these notes were important to return to when I was thinking about my system while sitting in my office. "Oh that's right, there was a breakout of oat crown rust in some of those areas that seemed to affect fruit counts on some of my plants. Disease may play a role in the population dynamics of this system." Now I'm working with data sets collected by others, that include similar incomplete pieces of information. As someone new to the system, I'm gaining insight from some of this information, but it's hard not to look at some of it and think "if we only had quantitative measures of soil depth for the full data set, we'd be able to say so much more!"

I'm sure I'll continue to collect opportunistic data in the future. If I didn't I wouldn't really be doing my job of observing patterns in nature. Also, I'd like to delve deeper into analysis methods that can be used with these data. Based on conversations I've had with colleagues in the past, I know that we need not simply shelve data because it's not consistent. But ultimately, I do hope that the next time I'm in the field collecting and something interesting comes up, I'll think, "I should take care in collecting that information, I'm going to want that information!"