How to run a data science journal club that your team actually engages with

by Linda Uruchurtu on Wednesday, 28 Jun 2017

At Lyst, engineering teams are cross-functional and we, as data scientists, spend a significant amount of time developing and optimising production-ready systems to help customers find the fashion they want. Many of these systems rely on research and prototypes that have been developed on the context of a specific area, and for that reason, keeping up with what is new while also expanding our knowledge, is a continuous challenge.

One of the ways we have found to address that challenge, has been the introduction of a dedicated journal club. The original concept of a journal club goes back to the 1800s in the context of medical science and these are commonly held in academia as a means of keeping up to date with the literature and providing training on appraisal skills. In practice, through group discussion with peers, journal clubs enable an environment that facilitates learning and encourages debate. This is key for data scientists, who have to be aware of new approaches and ideas coming both from academic and commercial settings.

Benefits of having a journal club

A journal club provides a link between research and implementation, as it encourages the practice of continuous education. However, there are many more benefits that go beyond reading and discussing a paper:

Narrowing the scope

New papers appear every day, and even keeping up with a subject can feel overwhelming. Choosing which papers to focus on by group consensus, makes the task more manageable and targeted. It also empowers people to bring forward papers that they think the group should know all about.

Learning something new

It can be hard to prioritise professional development when one is focused on delivery. Even the most organised of us, have a hard time working out how much time to spend improving our individual competencies. By scheduling a time and agreeing on a paper, individuals understand the scope of the task and the team can plan accordingly.

Incorporating best practices and validating approaches

“State of the art” is not limited to new algorithms or techniques, but also should include a good amount of case studies, white papers or blogs. A journal club is the perfect setup to discuss these, particularly when the outputs can be incorporated into an ongoing project.

Increasing team engagement

Discussion of technical papers fosters collaboration, knowledge sharing and overall improves team engagement.

Running a journal club

There is no ideal format for a journal club, it just matters that it serves the needs of the attendees. Depending on their expertise, the cadence of the meetings and the scope of the journal club, some formats might work better than others. However, setting the scope and the goal of the journal club should be the most important thing to ensure its success.

Set a goal and a cadence

Journal clubs can have various goals - upskilling, keeping up to date, acquiring general or specialised knowledge, etc. Whatever your team’s goals, a journal club can be a tool that empowers the team to achieve it.

Setting a realistic frequency is also fundamental. Deciding how much time should be dedicated to working towards a paper, and how often should the group meet, should be pre-agreed taking into consideration the team’s activities and current understanding of the subject to be covered.

At Lyst, our journal club runs every two weeks, with a paper per session and the time spent in reading the paper is taken into account when we run our weekly planning sessions. Our current goal is to expand our knowledge on subjects related to the project we are working, so papers will be chosen around models and techniques that align with our work.

Build up a list of papers

Before we started, we collated a list of papers that were relevant to the topic at hand, but this list was not exhaustive nor was it a ranked list. As a team, we decided on a paper to start the journal club with, and at the end of a session, we decide on the following paper. Additionally, every time a member learns of a new paper he/she feel strongly about, the paper gets added to the list and prioritised accordingly.

Some types of papers that we have covered include:

  • Classic papers around the subject of interest
  • “State of the art” papers
  • Review papers or book chapters
  • Case Studies

The papers that end up in our reading list, often fall in one of two camps: those that are of general interest, and those more relevant to our day-to-day work. General interest papers are often classic papers (e.g. VGG, ResNet) and/or case studies (e.g. Deep Neural Networks for YouTube Recommendations). At present, our reading list is heavily concentrated on more state of the art papers on NLP for short text, as we are currently working on search query interpretation.

Choose a format

We have experimented both with having a moderator presenting the material and with participants taking turns to weigh in on various points, and we find that the former works best when the moderator is particularly well-versed in the subject of discussion.

It is expected that members will have all read the paper and possibly study some relevant material directly related to the paper. It is not assumed that everyone understood the paper as a whole, but that there was some work around understanding the key concepts, motivation and points made.

On the day

The moderator introduces the paper and gives a brief summary of the points, the context and how this paper relates to our work. We then take turns to discuss:

  • What was interesting / important about the paper
  • What are the extensions of the ideas to our own project / domain
  • What points were not clear or are deserving of criticism

For points that require clarification, additional discussion is required and we find that having a whiteboard and/or draft paper handy is always a good idea.

At the end of the meeting, we go over our reading list and choose a paper for the next session.

Keeping it running

Possibly the biggest challenge when having a journal club is to keep it running, as people can forget to read the paper ahead of time, or deprioritise the activity if something else becomes important. These factors can be minimised if the task is given necessary importance at planning stage, and as long as all its members remain engaged and buy into the benefits of having it. Our current journal club has been running for over 6 months and we have continuously adapted it to suit our needs.

What we have found

Having a journal club has facilitated having regular time to be spent learning about new approaches and techniques, while also adding value in terms of critical discussion, outside thinking and keeping us up to date. We consider this to be a key activity within the Data Science Chapter at Lyst and is one that we believe, helps us become better at our jobs. Also, they are fun!