Prize from “La Recherche” to our Transmogrifiers paper

The French Science magazine “La Recherche” gives annual prices to contributions in different areas of scientific knowledge that come (at least in part) from French laboratories. This year they awarded one of those in the category of Information Sciences to the work that Christophe Hurter, John Brosz, Sheelagh Carpendale, Ricky Pusch and myself carried out and was published at UIST: Transmogrifiers.

tranmogrifiers2The work is a paper and a tool that enables very fast, intuitive non-linear spatial manipulations of existing data. You can learn more about the work from the paper (available free from my institutional website), and by downloading it from our transmogrifiers page.

The world’s population in 1880 people per pixel (or 4878 digits)

A few years ago I came up with the idea of FatFonts, a special kind of digits that encode quantity both in the shape of the digit (as in regular numbers) and in the amount of ink or black pixels (the area of the glyph is proportional to the number it represents). I then worked with Uta Hinrichs and Sheelagh Carpendale to develop the idea and publish a paper.

Numbers in Cubica FatFonts

The numbers 19, 28, 37, 46, 55, 64, 73, 82 and 91 represented in Cubica FatFont.

Although a quirky idea, FatFonts seem to have a bunch of usages… for example, they are convenient when you want to provide a table of numbers that is also a graphical representation. This allows the viewer (or the reader) to very quickly capture the overal distribution, but also to go in and read the specific number, which they can then use to compare to other numbers (in the FatFonts table or in their heads).

FatFonts are great in maps, and that is why Uta and I set out to create a poster that would give a picture of one of the most pressing issues of our time: world population. Thanks to SICSA (and our wonderful helpers Carson, Jed, and Michael), we got the time, money and support to develop the idea. The result is a poster that represents the population of the world using FatFonts.

An overview of the FatFonts poster of the world population.

An overview of the FatFonts poster of the world population.

The poster is made using an equal area projection of the world, and it represents data collected by CIESIN and others. Each grid in the main map, which represents an area equivalent to 200 by 200 km has a 2-level digit FatFont digit in it. That way we can know, with a precision of 100,000 people, how many humans live there. Naturally, the precision is as good as the data (and these are projections using 2005 data, the newest available), but it gives you a really good idea of where people really are. In fact, the map is so mesmerising that I have learnt a lot from it by just spending a lot of time looking at it. It is not only the distribution, but also the numbers. Obviously I am biased, but I strongly believe that seeing the numbers gives you a lot more than just representing density with colours, in part because colour scales are very arbitrary.

Since the number of dark pixels of a FatFont digit is proportional to the number that we are representing, we can calculate how many people each black pixel represents. For an A1 poster in the main area of the map at 600 pixels per inch, each pixel represents approximately 1880 people! FatFonts with the orange background are up by an order of magnitude, so there the ink of a pixel represents approx 18,000 people.

The South Eastern Mediterranean population is concentrated in the Nile delta (Egypt) and Palestine and Israel.

The South Eastern Mediterranean population is concentrated in the Nile delta (Egypt) and Palestine and Israel.

We partnered with Axis maps, who make wonderful typographic maps of cities, and we are selling them here. All the profits will be reinvested in research (e.g., helping pay research internships for students). We think that they are a wonderful present and that they are really fun to look at and discuss.

To give you a better feeling of the map, and because we like to try our new stuff, we have taken some Lytro images of the poster that you can explore in this gallery.

Books for a good PhD start

A research career is a complex career. It involves many skills and knowledge that are not necessarily related to the specific topic that you choose to investigate.

In my experience, students just before of at the beginning of their PhD research (at the beginning of their research careers) are often quite disconnected from the actual skills and background that will make them successful. This is why I try to supply my students with some of the knowledge that, sooner or later, they will need to apply. To help with this, I have selected four books that all my students get at the beginning of their PhDs (to read in their free time). Here is the selection, and why I selected each book. Note: I’d love you to share other books you think are valuable at this stage (use the comments below).

Book cover of "The Craft of Research"1. The Craft of Research (Booth, Colomb, Williams)

Science/Engineering students often think that writing is the boring part of the job. Most of them realise that they have to do it, and some might even know that they have to do it well to be successful. However, telling a student that they have to get better at writing is not the best approach. In the best case, they already know that they have to do it, and in the worse, they might start hating it.

Instead, I like to consider writing (of academic papers and reports) as thinking on paper. It is often not until I have written the last bit of a paper (e.g., the discussion section) that I fully understand the research that I have done, the implications, and the value of it. Of course, the research is mostly already in your mind (and in your code, data, etc), but putting it on paper takes you to the next step: you can communicate it to the world and, perhaps most importantly, to yourself.

And this is what this book is about: setting up questions, understanding the problem, structuring a solution, all mediated through writing. A particular favorite of mine is the bit about making arguments; being able to make a claim and support it with evidence in a convincing way is one of those things that students think they know, but only learn after supervision, much experience and, perhaps, reading this book.

Cover of the book "The Elements of Style" 2. The Elements of Style (Strunk & White)

Once you know why you are writing, you need to know how. Although some authors think that this book might not be as good as everybody else thinks it is, it takes many students out of some of the worst habits in writing, namely:

  • Writing to look smart.
  • Writing without thinking of the reader (e.g., long sentences).
  • Writing to fill in the space (lack of brevity).

Although the grammar advice might be somewhat antiquated and not always completely correct, the rest of the book, in particular the parts about style, helped me significantly improve my writing (although I certainly don’t claim mastery!). I think this book is particularly useful for students who are not native speakers of English and who come from traditions where clarity and brevity is not as central as in the English speaking scientific community (I’m from Spain, and I’m in shock most of the time I have to read or review a thesis in Spanish).

The most important point of the book might be summarised in a quote attributed to Blaise Pascal: If I had more time, I would have written a shorter letter. Well, a student should have the time, so the text should be shorter while keeping the crucial information. It takes time and effort, but readers (and markers) will be happier, the world will waste less paper, and the paper/dissertation will be more likely to be read and used by others.

Cover of the book "The A PhD is not enough!" 3. A PhD is not enough! (Feibelman)

Very often students lack context. They might know that they want to do research, they might even know that they like research. But what else is involved? Why would a PhD be useful? What does it get you? Most importantly, what does it NOT get you?

This book might be a bit harsh to start on (sometimes reality is a bit hard), but it provides a nice glimpse on the world of research and highlights much of what really becomes the focus of what you do as a researcher and academic. The bad news is that there is a lot more of politics, strategy, and marketing in this job than what we all expect when we start. The good news is that you can be prepared for it, and might even get to enjoy some of those bits. In any case, and from my personal opinion, being in research is awesome, but it is better to be ready for what it requires from you.

Note: there are other similar/related books about research and academia that are worth mentioning and reading (e.g., this, this, and this), but perhaps not strictly necessary at the beginning of a PhD).

Cover of the book "Getting Things Done" 4. Getting Things Done (Allen)

So, what is really required from a PhD? Effective work and perseverance. Most people in academia know that you don’t have to be a genius to get a PhD. Gosh, you don’t even have to get the best or most novel ideas. But your ability to work hard, avoid procrastination, and persevere will determine the chances of being successful in your PhD and of being able to take your career further.

Although there is a lot of crap in the self-help and productivity literature, this does not mean that it is better to ignore it all. This book describes my favorite system, and although it is not perfect and I still work really really long hours, it has helped me enormously. This might not be the best system for productivity that there is, or be the best system for everyone, but at least is honest, well explained, and feasible. I’m a fan.

The reality of a PhD is that, if students think they are busy during their undergrads or MSc, the demands on time will only keep increasing. This is certainly true after you have become a doctor. If you don’t like GTD, you better find something else!

 

Have you come across other books that you think are useful? I’d love to compile a list with your suggestions, and I might even add a book or two to my list!

7th Century Scholarships

The School of Computer Science at the University of St Andrews is offering a number of 7th century scholarships. If you are interested in working with me in any of my topics of interest (mostly within HCI and Information Visualization), send me a line. Here is more information about the offered projects and how to apply:

The current deadline is March 31st, 2014.

What and how to log in your experimental HCI software

You have worked hard on your project. You searched the literature, learned about the methods, painstakingly designed an experiment, and have almost finished implementing the software, but… what about the logging?

Most students think that logging is easy. Just write some lines on a text file. A couple of hours on the software should do, right? I don’t think so.

Experience has shown me that logging is extremely error-prone, and that paying little attention to it results in incredible loss of valuable information and time, and that most students do not realize how important doing good logging is.

Before I go on let me qualify a little bit what kind of logs I’m talking about. I’m referring mostly to the kind of experiment that you often see in HCI or Experimental Psychology, were there are many participants, and each participant performs many trials, possibly in multiple conditions. This is usually information that is suitably recorded in a simple format like comma separated values (CSV).

Let me state then my five fundamental laws of logging:

1. Log everything you can

Disk space is cheap, your time is expensive. Probably the most common mistake here is not to record enough information in each record because you think it is redundant and a waste of space. For example, why record today’s date in each record of each trial, if they are all the same? I tend to record the same information in each trial anyway, because it is always easy to discard info, but it takes a long time to recover data from different sources (including the name of the file, the creation date that the OS stores etc.). Don’t assume that you will remember where you are storing all that information when the time for analysis comes. Things that I tend to save in each trial record: userId, all condition values for all factors, the number of the trial with respect to the condition, the number of the trial with respect to the cell, the absolute number of the trial within the experiment (and the phase) and, of course, all the dependent variables.

Perhaps the only caveat to this is that all this recording should not negatively affect the performance of your program or the accuracy of the time measurements. If performance and timing accuracy is important, good strategy is to write first to memory, and only save to disk in between trials (or when timing is not an issue).

2. Make your logs self-contained

Name your variables wisely, and always include names on top of the file. This should be quite explanatory, and most analysis programs (e.g., SPSS) will allow you to name the variables automatically from the file. Handy and convenient. A good complementary practice is to have some comments (or a separate file) that provides an explanation of how each variable is recorded, but this requires discipline to maintain, because the logger program tends to evolve. Best to keep your measures simple.

3. Debug, debug, debug

Never assume that your code is recording properly. Simple visual inspection won’t cut it. I have experienced many problems that only became visible after all the experiments were recorded. The best way to avoid problems here is not only to debug, but also to use your pilots to gather realistic data, and analyze it in the same way that you will analyze the overall results from the finished study. This is not only good for your logs, it is also helpful to avoid possible flaws in your statistics (e.g., I do not believe in a posteriori power analysis).

4. Backup, backup, backup

Don’t trust your hard-drive, don’t trust your experimental software. Within your program, save the data to drive as soon as you can (but take into account the comments in point 1). This will allow you to recover from failures in your software. It is actually kind of nice to code your experimental software so that you can restart it again at any given trial within the session. When the experiment is finished, the first experimenter action should be to verify that the data is in the right place, and perhaps making a copy (or send yourself the data to your gmail account – if your data is properly anonymized, of course).

5. Protect yourself against confusion

If something can go wrong during the experiment, it probably will. It is good practice to save the date hour and second of experimental recording in the name of the file that your program saves. This will help you prevent accidental overwrites. Similarly, try to leave as little as possible human intervention for the actual session. For example, I never trust the experimenter -often myself- to select the right name for multiple files depending on the condition. Have the software do something reasonable for you. The only thing that I often make configurable is the participant identifier, so that I can separate real trials from debugging logs.

Hopefully these might be useful to you some time. Write a message below if you agree/disagree or want to add some more advice!

Our mini central-european tour: Munich, Konstanz, Zurich

Last week, Uta and I had the chance to take a tour of three impressive labs in Germany and Switzerland. The German and Swiss hospitality cannot be overstated, but most impressive was the range of research.
curvepublic wall displayIn Munich we visited the Media Informatics and Human-Computer Interaction Group, invited by Andreas Butz. It was really nice to see finally the curve (in the picture), among lots of other excellent research, including work by Alice Thudt, a current collaborator of Uta.

collaborative search controlrooms2In Konstanz, we visited the Human-Computer Interaction group led by Prof. Harald Reiterer. The range of research and development is very broad. A particular favorite of mine is the work on zoomable multi-display environments (the ZOIL API), and a number of other interesting experiments related to large displays.

Finally, we had the chance to visit Dr. Elaine Huang and her ZPAC laboratory; we have strong links with this lab (including Helen, another iLab graduate), but there were many other strong research reasons to visit ZPAC; most related to me is the work by Gunnar Harboe, but it was great to learn too about projects on sustainability, cultural communication, and domestic ubicomp.

Naturally, I cannot make justice to everything that all these researchers do in a few lines… maybe you should just visit them too :). We really would like to thank all our hosts for wonderful and insightful visits (special thanks to Fabian, Hans-Christian, Christian, Alice and Helen for bearing with us for so long). We are looking forward to your visits!