Daily data digest

I know, at first routines suck but you have to admit that once you get used to it is very useful for you daily work. At the beginning of my day I try to spend some time reading blogs and websites focused on my area of specialization: data. I made a small compilation of some links I find interesting. You can subscribe here

  • Open Knowledge Foundation Blog
  • OpenlyLocal news
  • tuderechoasaber.es (in Spanish)
  • Nación Data (in Spanish)
  • WSJ.com: The Numbers Guy
  • Calidonia (in Galician)
  • Calidonia Hibernia (in Galician)
  • countculture
  • Data.gov.uk Blog
  • DDJ
  • Data Miner UK
  • clairemiller.net
  • FlowingData
  • visualcomplexity.com
  • information aesthetics
  • OUseful.Info
  • civio.es (in Spanish)
  • Data Journalism Blog
  • News: Datablog | guardian.co.uk
  • Advertisements

    Live from the Specialist Media Show!

    I will be live blogging from The Specialist Media Show, at the Think Tank in Birmingham

    Click Here

    Live from Shoes10!

    My colleague Franziska Bährle and me will be live blogging from the Shoes10 conference “Press for Health?” in West Bromwich today.


    My critical evaluation about my community of practice

    This a post for my first assignment at the MA Online Journalism

    Data journalism: my induction

    It wasn’t a casual choice. From my degree I have some experience on audio and video journalism. I made radio, podcasting, audio slideshows and even some infographs. But Computer Assisted Reporting was completely new for me.

    I decided to explore it because I have curiosity, I know that it is a growing field and because I’m a little bit masochist.

    At the beginning I had a very obvious problem: how am I suppose to blog and comment about data journalism if I don’t have any idea? Well, I think I solve it in a good way and I wrote about all the things that were useful for my learning.

    I know that my contribution to the community was very poor in terms of quality, but blogging and tweeting about this was useful for taking my first steps.

    For example, my first post was compilation with the most interesting Twitter lists on DDJ and thanks to this research I discovered very interesting people to follow. Moreover, I’m currently building my own list with the most useful users for my projects.

    Although I haven’t made a post about it yet, I learn a lot reading blogs about data journalism. I’m sure in the few weeks I’ll improve my current RSS list, but this sites were very useful so far. They allow me to see what are the main trends and also discover new techniques that may be useful for my projects.

    This was my starting point: people to follow and blogs to read. Only with this I could talk about learning new skills.

    Know how

    If I want to work on data journalism I will need to develop some skills and techniques.

    • Look for the data. It’s true that in the UK there is a Freedom of Information Act, but you still need to know which sources are the most suitable for each case. There are many useful sites to look for: data.gov.uk, Guardian Data Store, openlylocal.com etc. I should blog about this interesting resources.
    • Scraping. Many times you can’t find databases with the info you need so you scrape it from the web. This technique is quite complex to understand because it uses code but, luckly, you may play many of them without learning Python or Rubi. I tried the tutorials and the mailing list of Scraperwiki to get started and solve my problems. It’s amazing the first time you bring all the information you need from a webpage to a spreadsheet!
    • Excel. This is an essential software for Computer Assisted Reporting. I’m getting familiarized with the interface and the main options and also trying to use the most useful formulas. In fact, I wrote a post about it.
    • Refine. Datasets are not as clear as we wish. In fact, many times they are really messy. That’s why Google has created Refine. It’s a very interesting tool but not very intuitive, so I still need more practice with it.
    • Find the story behind the numbers. I think that my journalist instinct is a bit rusted, because I still have problems to ‘read’ data from a professional point of view. I’ll keep working with spreadsheet to learn where I have to look to.
    • Visualizations. I’ve just tried Many Eyes and the next steps are Google Fusion and Tableau. I’ve read reports about both tools and they seem very powerful.

    Engage with the community

    Honestly, I didn’t talk too much with the community. I don’t feel confident enough. I’m a very beginner so I don’t have an opinion about most of the things related with data journalism. Basically I just share interesting links on the social networks. But there are two things I’m proud of.

    1. The interview with Caroline Beavon. We had a very interesting talk about Computer Assisted Reporting and visualizations. She taught me that the best graph is the one that is in the middle between beauty and clarity.
    2. The blog post for the OJB about La Nación Data. It was not just the opportunity of writing in a famous site but also make a good contact in Argentina. I met two of the main data journalists of the newspaper and we are still connected via e-mail and Twitter.

    Finally, I tried to arrange an interview with Simon Rogers from The Guardian, but it wasn’t possible (yet).

    Next steps

    1. Keep working on my reading and Twitter list
    2. Develop my technical skills
    3. Look to as many datasets as I can to find the stories.
    4. Engage more with the community

    Future projects

    Currently, I’m working on a piece of data about A4E funding, but I’m still waiting for some information from the SFA, so I can’t say much more. But I hope that this will be my first story in data journalism.

    For my final assignment I thought in the impact of the rise of the tuition fees on the number of applications. But I don’t want to show just the difference between two figures. I would like to go further and research which social groups or social classes are more affected. For this project I will need to understand the british university system, how it’s funded, how students pay their fees and where may I look for the data I need.


    I also explored a little bit of podcasting. In fact, I posted three pieces. One was a collective work with my fellow students, where we had a discussion about audio journalism an its potential. Another was the interview with Caroline Beavon I mentioned before and the last one was about Manuel Fraga, a very controversial Spanish politician.

    From the feedback I received, I’ve got the following conclusions:

    1. Sound quality is VERY important. Many people highlight how good was the recording of the interview, even we where in a café.
    2. A good narrative structure makes things go easier. I need to improve the middle point of my podcasts, especially if I work with non-English speakers.
    3. I need to pay more attention to the audio edition. That means volumes, fade in, fade out, etc

    More information about my work

    Delicious with data journalism links

    Reading list

    My question in the Scraperwiki mailing list


    8 useful Excel formulas for Computer Assisted Reporting

    Any journalist who wants to work with data must learn how to use spreadsheet applications like Excel or Google Docs.

    Here it’s a list of 8 useful formulas for Computer Assisted Reporting:

    1) =SUM(number 1, number2, numb…)

    The most simple Excel formula, but it is essential. How many stories are related with public spending or funding? Calculate the total sum of a column is one of the most common operations you will use in data journalism.


    2) =(number1-number2)

    Very useful to calculate the increase or decrease of two values. For example the number of people who asked for house benefits in two different years.

    3) =(New number-old number)/old number 

    It’s related with the previous formula and we use it if we need to find the percentage of change between two different figures. Sometimes calculate the increase or decrease is not enough for having an overall view. If you don’t trust me, just try.

    4) =AVERAGE(fist cell in a range:last cell in a range)

    This formula is quite obvious. Use it when you are looking for an average. But beware that in journalism this is not always the best choice as it may be distorted by a large figure.


    5) =MEDIAN(first cell in a range:last cell in a range)

    It looks for the mid-point and many times is more useful than the average. Imagine you have a spreadsheet with wages and they are very unequal. It would be better to use median as it’s not distorted by the extreme figures.


    6) SUMIF(range, criteria, [sum_range])

    This formula is used if you need to add the values that have a certain characteristic. For example, calculate the total amount of money given to just one supplier.


    7) COUNTIF(range, criteria)

    The syntax is very similar to SUMIF but it is used for count the number of cells of a certain criteria. For example all the spending over 25000£


    8) =VLOOKUP(C2, A2:B300, 2, Exact)

    This formula is a little bit more complex, but very useful. Let’s use an example. Imagine you have a spreadsheet with the IDs of the employees of a certain company and their wages. You want to know the names that are behind those numbers, but all this information is in another spreadsheet. The IDs and the name of the employees. You can use this formula to combine the code that identify each worker and mash it in the same sheet. Follow the steps showed on this link if you want to try.


    These are just 8 formulas I found useful but there are more. Which ones do you think I should add to the list?

    When I was a beginner…

    Data journalism is a huge field. There are lots of things you need to learn and a day only has 24 hours. During the last few weeks I read as much as I could, played with the different tools I discovered and tested some visualizations. But I hadn’t completed all the phases of one project until this week. It wasn’t a proper story from a journalistic point of view but it helped me to see all the potential of this area.

    I had a discussion with one of my relatives about a new and small Galician online newspaper called Praza Pública. He argued that most of the news they published were related with the power struggles in a certain political party. I pointed that it was a wrong perception and that if we compare the number of pieces related with other topics, it was a small percentage. After that I thought that I would be a good idea use some things I learned about data journalism and back my point of view with some visualizations. So I scraped all the news published by Praza Pública and downloaded the data in a spreadsheet. With that information I made two charts.

    On this one we see that the news included in the special about power struggles (pale blue) are more than the ones that are included in other specials.

    But if we have a look to the other chart we see that it’s just a small part of all the news published (brown).

    I know that this may look simple and stupid, but it was the first opportunity I had to use the new skills I’m learning. In a few months I’ll read this post and I’ll laugh.

    Why am I interested in data journalism?

    When I started my Bachelor’s Degree at the University of Santiago de Compostela, many of our lecturers asked us why we wanted to became journalists. Some people said because they loved writing and they would like to be writers, others gave a kind of philosophical answer about the truth as an eternal value but most of us just had the traditional concept of the journalist as a person who is always looking for a good story and annoying the powerful. Now I think that all of us were naïve, romantic or non-realistic.

    Communication in general and journalism in particular had changed (and are still changing) a lot in the last decade. The boom of internet and ICT brought a completely new way of telling stories, so it’s reasonable to think that the way on that journalists work should change too.  Nowadays, information is produced by millions of different sources, not just by professionals, therefore, we must rethink the role that journalists must play in this new world. I would like to quote Simon Rogers, who runs Data Blog at guardian.co.uk and that in his book ‘Facts are Sacred‘ gives a very interesting idea about data journalism and the future of our profession:


    ‘A new role for journalists as a bridge and guide between those in power who have thte data (and are rubbish at explaining it) and the public who desperately want to understand the data and access it but need help. We can be that bridge’


    This is the reason why I’m interested in data journalism. It’s not just about telling stories, we need to  go further and provide to the public something that they can’t find anywhere else and that makes journalists indispensable.