20th July 2017
On the 15th – 16th July, the NHS held a Hack Day in Manchester for those who are interested in healthcare technology. Nick Pettican, part of our insights and planning team, went along to the event. Here is his blog split over the two days.
When the email about the NHS Hackathon circulated the office, I knew it was an opportunity I couldn’t miss. As a data specialist working within healthcare, I was intrigued by the challenge to work alongside other technologists and healthcare practitioners to generate ideas to improve the future of the NHS.
After a spout of caffeine the ideas started flowing and at 10AM the people that wrote their ideas down were given one minute to pitch to the rest of the room.
Data – big & smart, technological awareness and medical education were all recurring themes overarched by innovation & open sourcing and how it is both needed and critical for the future of the NHS. Data collection was an interesting discussion as it is predicted to be costing thousands of lives and no one seems to have gotten it quite right – yet.
The big idea
15 ideas later, it was time to choose. Ultimately NHS’s big data, pitched by Jenny who works at NHS Digital as a data analyst, drew me in along with other participants from the day.
The NHS publishes large amounts of publicly accessible data which is extremely valuable to both analysts working to improve patient care and decision makers that will use the data to improve the future of the NHS.
The first problem we needed to address is that every time a weekly or monthly dataset is published, it is within a new URI (Uniform Resource Identifier – a webpage within a webpage within a webpage etc.) in the NHS website. This has created hundreds of URLs that analysts then need to navigate to download the data. The second problem is that the Excel document the data comes in is badly formatted, meaning it must be cleaned by selecting, copying and pasting each of the hundreds of tables to make sense of it. And the third problem was that the data needs to be accessible for all, and in its raw format, it is not easily understandable.
Before we got started, we split the team in two. Those with coding and data science abilities were tasked with creating an algorithm to collect, clean and store the data into an open-source database while the other half, which included HCPs, focused on understanding the end user.
Building the algorithm
We, the dev and data science team, got to work on mapping out the pipeline. Paul worked on a spider to crawl the NHS public data website while I wrote a parser for the datasets in Excel sheets to clean them, date them and save them to CSVs, ready to be uploaded to a single database. Joe set up the AWS S3 bucket and a MongoDB database to store the data and Tobias worked on sending the data over to the database. Ben and Ross explored ways of displaying the data in the front-end and started prototyping. By the end of the day we had a working algorithm that pulled the data from the A&E pages, cleaned it, dated it, and saved it as CSVs. However, it still didn’t have the ability to send the data over to the S3 bucket and subsequently to the database because the data needed extra cleaning.
The other half of the team mapped out the user experience, wire frames for the web platform and prepared for the pitch the next day. The ideas kept expanding and the occasional catch-up helped to keep everyone up-to-date and busy with new tasks that arose as we progressed. To mark the end of day one we all headed to the nearest pub for a well-deserved pint!