Data Modelling & Analysing Coronavirus (COVID-19) Spread in Nagaland using Data Science & Data Analytics

Ochen Ao 



With much of this “jargon” going in the state around Covid-19 and many information and statistics from various sources being circulated and made available, one is left to ponder and fails to come to a definitive consensus. This article is an attempt to model and analyze the coronavirus (orthocornavirinae) spread in Nagaland purely from a data science & analytics perspective using a publically available dataset.


A.    Sources and credits:

There are a lot of official and unofficial and unconfirmed data sources available. I have used the official Covid-19 India state-wise dataset distributed by the Ministry of Health & Family Welfare, New Delhi, India for my case study.


1.    MoFHW -

For making the data-set available to developers, analysts, researchers and data scientists and to anyone who’s interested in fighting against this pandemic.



Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data science goals.


3.    Jatin Chaudhary, Bangalore, Springboard India.

Paper on data-modeling, analysis and trend patterns of the spread of Covid-19 globally and the India focus.


4.    Upasana, Research Analyst.


5.    Alison Lynn Hill - Research Fellow - Mathematical biology & Viral dynamics, Harvard 
University, Massachusetts, USA.


Simulation on Covid-19 healthcare capacity analysis.


B.    Disclaimer

This analysis has been done using the dataset provided and the trends and findings are purely based on the amount of data available. As the modeling was done using only a month old data, the results and the accuracy of information should not be taken as a professional advice. However, the data modeling and analysis has been executed to resemble as real-world as possible. Data Scientists, Epidemiologists and Researchers all over the world are doing some excellent work to analyze the COVID-19 data too. I encourage you to visit them.



I have split up the MoHFW provided dataset filtering only for the state of Nagaland into a time series + consolidated data for this analysis. Columns are same for the first two data-frames as both of   them contain time-series data. The nagaland_dataset is not a time-series but an aggregated data.

Here’s a view of the column of the nagaland_dataset_confirmed_cases.csv.


Here’s a view of the column of the nagaland_dataset_recovered_cases.csv.


And finally, the shape and columns of the nagaland_dataset.csv.



Now using the dataset, we will now perform exploratory analysis and summarize stats and plot some trends in the existing data. Then model the data and try to predict and forecast the future count of cases using Artificial Intelligence and Machine Learning.

First, let’s look at where Nagaland stands now in terms of the coronavirus spread till date. (Till date here indicates till the time this dataset was sourced)

Case Count Summary as on 30th June 2020


Note: DailyConfirmed and DailyRecovered columns here is the sum of all Confirmed (Active) and Recovered cases.


To understand how this case has spread across the state over a period of time, I have visualized and graphed the time-series data. As you can see the below graph which show the confirmed and the recovered cases, the confirmed (positive) case is  marked by the black line and the recovered case is marked by the thicker green line.

Notice that the cases started surfacing after a fair period of “quietness” until the 25th of May. There were no cases till then. Also, the trend indicates that the number of confirmed cases is rising exponentially with no fixed pattern. Towards the end of the graph, you will see a massive jump. Again, this is just a data science assumption but this indicates that the number of test being done versus the total number of people is not at par.

The recovery trend also indicates randomness and does not exhibit any fixed pattern. However, at this point in time, it is very difficult to gauge an assessment against the recovered cases so far. 


Looking at the bar-chart table below showing only the last 10 days of the dataset, we can see that there have been quite some activities. Note that Confirmed cases are still ahead and continues to grow.


Here, we can see the cumulative trend of the spread of the virus. You will see that it has started rising exponentially by towards the end of May and we are not seeing any stability or a decrease.


A bar-chart graph showing the daily rise of confirmed cases. June has not been a good month!


And here’s the historical trend of the recovered cases in Nagaland so far. We can see that the recovered cases too have climbed straight up within a very small data frame (or a period of time to the non “geeks”).

Again, we do see here that there is very high inconsistency and we fail to see a familiar pattern.



The dataset does not have any information about district-wise cases but i was able   to procure the information from a local daily (daily cumulative stats published). Due to the non-availability of data, there has been no effort to perform an analysis of districts. However, here’s some stats for you to chew on. Data is as on 30th June 2020.





You can visit my personal Nagaland COVID-19 tracker and dashboard page to see the State overall recovery score and the COVID-19 hospital capacity stats – best viewed from a mobile phone.

3.    Data Modelling & Prediction/Forecasting

In this final section, we are going to make an attempt to generate 15 days ahead data and predict and forecast future confirmed cases, recovered cases and deaths, if any. Be advised that this prediction is based completely on the modelling which is based on the past data and trends collected. As data is dynamic in nature, the number of cases will change evidently due to factors such as seasonal factors, climate, social distancing, following SOP (Standard Operating Procedure) norms, medical improvements in early case detection and treatments, etc.

a.    Confirmed cases forecast for the next 15 days.

After running the simulation, here the graph indicates that the total number of confirmed cases will rise to 600(+) by the end of the second (11th July) week of July.


b.    Death cases forecast for the next 15 days.

After running the simulation, predication and forecast indicates there will be no deaths. 


c.    Recovered cases forecast for the next 15 days.

After running the simulation, here the graph indicates that the recovered cases will continue its current pattern as you can see in the graph the actual and the predicted indicated by the black dots and the blue wavy line and blot respectively.


4.    Conclusion

I hope this article has proved useful to the readers and now have seen the impact of coronavirus in Nagaland from a “data science” perspective.

Let’s give a hand in fighting this pandemic at least by quarantining ourselves by staying indoors and protecting ourselves and others around us. Follow government and medical officer’s instructions and directives.

In my next article, I will be talking about COVID-19 Hospital Capacity and Preparedness in Nagaland specially Dimapur. So please stay tuned!