The Yellow Bucket: Methods, Sampling, and the Role of AI

By Brainerd Prince

In our last article, we started the conversation about methods, and methods being the hands of research. To recap, the red bucket gives us a research question; the green bucket gives us the data to answer the research question; and the yellow bucket is the methodology.

And within methodology, we have both eyes and hands. While the eyes give us the conceptual framework and a vocabulary to voice our argument, the hands of research have a twofold purpose – collection and analysis of data. The hands of the yellow bucket particularly refer to data that needs to be collected and analyzed so that we can successfully answer the research question with which our project began. Choosing the right method is important because our method, in some sense, will determine the data set that we will gather. And the nature of the data gathered already sets in motion the kind of analysis we could possibly do with it.

But before we step into the different types of methods, we have to step back to bring back to memory, the green bucket, the object of research. The object of research is the place from which we can get insights and information that will answer the research question. Once we have finalized what the object of research is, then the object of research itself will determine the type of methods required. For example, if the object of research is a text written by a person, then the method to collect data and analyze data from that text would use textual analysis or some form of discourse analysis to glean insights from the text. Often a thematic analysis is predominantly used as an effective method to glean insights from a text.

Let's take another instance. If our object of research is a living community, it's a group of people doing some practices together, having certain beliefs and values, and living them out in unique ways. If that is our object of research from which we choose to gather insights to answer our research question, then the method would be an anthropological method or an ethnographic method, which will involve more specific methods like participation, observation, informal interviews, open-ended interviews, or even plain observation. These methods would help the researcher to engage with the community and gather necessary information while always ensuring that their personal prejudice, bias, and subjectivity do not impact the research process. While all data in this instance is mediated through the subjectivity of the researcher, the researcher will have to use strategies like triangulation or other means to ensure that personal biases are kept aside.

In case the object of research is to observe natural phenomenon, particularly those that are occurring at periodic or regular intervals, then setting up an experiment and making sure of the right conditions for that phenomenon to take place and having the necessary equipment to record the observations would be the method. The methods that I have primarily talked about until now are methods to collect data, and collecting data, as we have mentioned, depends clearly upon what type of data is required to be collected.

While collecting data, one needs to be careful of a few things. This again goes back to the type of research question with which we have begun the project. Suppose our research question is about a certain population or a geographical entity.

For example, how do Indians prefer their food to be cooked? Now this is a very broad question, but particularly it talks about Indians, which is a huge population. And food, again, has a great variety of dishes under it.

Therefore, if our research question is asking us for insights about the population of humans categorized as Indians and about their food, then, of course, we will not be able to ask every Indian about their food. Therefore, we have to define what we mean by Indians. Do we mean urban Indians? Do we mean rural Indians? Do we mean middle-class Indians? Do we mean the working-class Indians? Who are we talking about when we use the term ‘Indians’? As well, what type of food are we particularly referring to? Even then, the population size would be very large; perhaps it would run into millions if not over a billion.

If that is the population we want to represent through our research, then we need to be able to get a sample of that population that can be studied effectively. The research project will not be viable if the entire population will have to be observed and studied. Therefore, what becomes of immense importance is our sampling strategy. If we have to predict something about the population, then our sample must represent the population. In other words, a sample must be a microcosm of the population.

It must have all the major characteristics of the population. Therefore, what is an appropriate sample size becomes an important question. Furthermore, how do we ensure that it's representative? A method that's often used is called random sampling to ensure that we don't dip into a particular pool, but rather give equal chances for every member of the population to become part of the sample. The hope in this case is that the sample would capture the diversity in the population. And thus, become representative. 

Another sampling technique or strategy called the snowballing sampling technique, can be effectively used if we are interested in a certain type of data which is interspersed in the population and not explicitly visible. In this case, finding the first one could help us lead to others of the same type. In other words, it uses the system of referrals and thus, from one, the sample size snowballs. Now, this sample may not be representative of all the surgeons with regard to demographics or any other parameters. However, the samples would have experiences that the research question is keen on exploring. 

Thus, we see that data collection depends hugely on the object of research as well as on the research questions that the project seeks to answer. Once the data is collected, we need to analyze the data. Data analysis, again, depends on what type of data has been collected. If it was textual data, like we said, there are methods like discourse analysis and thematic analysis that will do the job.

When we collect datasets, two types of analysis can be done: quantitative and qualitative. These two types of research analysis methods deal with data which are either in numbers or in words, respectively. Today, often, a mixed method approach is used where both quantitative data, which can actually be used to get statistical values as well as qualitative data, which can support the predictions made by statistical inferences through the insights shared by the respondents are used together.

Several books give details on how to use these methods. The deeper insight that I would like to share has to do with choosing the appropriate method for a certain dataset. And I believe this is governed by the question that the research project begins with. What I mean by that is the entire journey of research is predicated upon and governed by the type of question that is raised in that project. 

However, when research is done on humans, although methods from the natural sciences have been used to find patterns and predict their behaviour, all such methods must have certain disclaimers about the fundamental, unpredictable nature of humans and human societies. Auguste Comte, Emile Durkheim, Max Weber, and many other sociologists and social scientists in the 19th century envisioned the social sciences to be mirrored upon the natural sciences.

However, we have quickly learned that human data is quite different from the data that is obtained in the natural sciences. Data in the natural sciences, at the microscopic level or galactical level, tend to behave with uncertainty and ambiguity like humans. Perhaps we are breaking away from the shackles of the Enlightenment age, which was founded on modern conceptions of rationality and knowledge, believing strongly that we could arrive at foundational principles, truth, and certitude about the functioning of the human world just like the natural world. As we progress as a human race, along with the journey of our planet and the universe, some of these presuppositions need to be questioned. 

I want to end this section with a reference to artificial intelligence and some of the technological advances in that area. With the coming of AI and Machine Learning and the training of AI models, I want to argue that these models can become an effective means to collect data as well as to analyze data. They can find patterns that the normal human mind might miss or not fathom. They can look at large data, or what we call as big data and look at patterns that are there within the dataset. So today, when we think of methods, irrespective of whatever research objectives we are working with and the kind of methods we choose, being versatile with AI tools would only enhance our data collection and analysis. In the next column, we will look at how, from our data analysis, we will get our findings and results of our research with which we can form our argument and effectively answer our research question.

Dr Brainerd Prince is an Associate Professor and Director of the Centre for Thinking, Language, and Communication (CTLC) at Plaksha University, Mohali.



Support The Morung Express.
Your Contributions Matter
Click Here