Replit following Spring 2021

Dave Musicant

The Carleton CS department is planning to go back to in-person classes this fall. This seemed like a good time to revisit my previous blog entry on using Replit in Spring 2020, and describe how the approach evolved over the last year, and what worked well and what didn't.

Courses taught

I taught four sections (two different courses) using Replit this last academic year:

Programming Language Design and Implementation ("PL"): Fall 2020 and Spring 2021
Introduction to Computer Science ("Intro"): Winter 2021 and Spring 2021

I'll talk first below about the generalities that applied to both classes, and then get more specific for each class later on.

Our organizational approach: Teams for Education

It was important to us that our students be able to store their code in private repls, so that solutions to our assignments did not end up online. To that end, we transitioned to the Replit Teams for Education product. We purchased a departmental license which enabled us to create a "team" for each of our classes, and so I used Teams to manage Intro in the winter and the spring, and PL in the spring.

(In the fall of 2020, Teams for Education was still missing a feature that we needed, and so we stumbled along with a Hacker plan upgrade instead. PL in the fall of 2020 used that approach, but we transitioned to Teams at the end of fall 2020.)

Teams for Education enables you to create a "team," which is better thought of as a course section. You can enroll students yourself, or give your students a link with which they can enroll themselves. Once enrolled, they can then access assignments that you create.

Within Teams for Education, you can create individual assignments or group assignments. In either case, you first create a repl as a template, with any starter code you want in it. By default, it is an individual assignment. This means that when students click on the link that you provide them for that assignment, a private fork of that template is created for them to work in. If you configure it as a group assignment, there's an interface in Replit where you can enter in the student groups. When a student clicks the link for the assignment in this case, they are taken to a fork of the template that is created for the entire group, and all students in that group have shared access to the project. I used the group functionality for all of my paired-programming assignments.

On the whole, Teams for Education did the job I needed it to. It allowed me to construct assignments, put in starter code, and assign programming partners. There's a dashboard for instructors where you can see a list of all students working on a project, with a link to the repl itself where you can join the repl yourself multiplayer. So from a support perspective, it worked very well. If a student showed up in Zoom office hours and asked for help, I could just jump over to the dashboard, find the assignment, find the student, and click the link to join their repl in multiplayer mode.

The interface for actually adding groups of students into group projects is clunky. This has improved notably over the last year, which is great and much appreciated, but it could still use more. One main source of frustration I faced is that I assign pairings for my students which hold over multiple assignments. Unfortunately, the Replit interface for assigning groups requires that I manually enter in the groups for each new assignment. There is no functionality for uploading a CSV file or for importing a set of groups from a previous assignment. So it was fairly cumbersome to have to re-enter the same sets of pairs repeatedly for multiple assignments. When all is said and done it only took me a few minutes to do this for each assignment, so it was mostly just an annoyance. One real problem that did result, though, is that this was more error prone than doing it automatically, and once or twice a term in each class I accidentally paired the wrong students together. On the occasions where I had to rearrange partners after the fact, either due to error on my part or because of issues that the students had, this was cumbersome to figure out. I couldn't add a student to more than one project at a time, which would have useful at times. I had a few situations where I needed to spin off a student to a new project of their own after work had been started. It would have been helpful for that student to be able to access the original repl while copying code over.

Another challenge we faced with Teams for Education was with regards to students getting help from our lab assistants. We have a collection of students who work as lab assistants, and during the last year they were assigned hours to be available on a Discord server. Students in classes would log into the Discord server and ask for help. The challenge we faced was that due to the privacy model in Teams for Education, the lab assistants couldn't join the student repls (in order to use multiplayer functionality), and the students ended up just having to share their screens with the lab assistants. That worked fine, but it took away one of the key benefits of Replit. The Teams for Education permission model didn't have any way for us to allow lab assistants to visit student repls on request by the student. Instead, the only thing we could have done was to give the student lab assistants complete administrative access to our courses. It would be great if we could designate a set of students as lab assistants, for whom students in our classes could then invite to join via multiplayer.

Successes with Replit

Low barrier to entry

Across both of my classes, Replit generally performed well. Most of what I said in my previous blog entry applies, so I'll just summarize up the relevant points here. Most of the time, students could very easily click a link to fork a project template, and just get started right away. This is fantastic. There are no local installs, there's no confusion about what's on their local machine and what isn't, and there's no need for multiple sets of directions for different operating systems. This is a major win, and it seems like it deserves more text and accolades than this paragraph would indicate. It just doesn't take that much text to describe how good this all is. One of Replit's greatest strengths is the extremely low barrier to entry that it provides.

Multiplayer for supporting students

Likewise, multiplayer capability was amazing for me being able to provide remote support to students. I talked in my previous blog entry about how it allowed me to virtually share a keyboard in a non-invasive way with my students, and this continued to be the case. Likewise, students could collaborate seamlessly with each other. I did end up learning that this is something of a double-edged sword; I'll say more about that below in the section on challenges.

Version history

Replit has its own automatic version history mechanism for all files. (This is different from the Git integration they provide.) We only used it occasionally, but it was a complete life-saver when a student accidentally lost their code for one reason or another. It was also useful as evidence when a student was suspected of an academic integrity violation.

Challenges with Replit

File synchronization problems

Replit gives the user the illusion of working on an isolated and individually owned remote computer, and mostly does this pretty well. This is merely an illusion, however, and sometimes it breaks down badly. The lack of persistence of the home directory, that I talked about in my last blog posting, is one example of this.

When working in Replit, it is apparently the case that there isn't a single virtual machine (VM) that is dedicated to your repl. Presumably that would require too many resources on their servers. Instead, when you connect to Replit, my understanding is that you land in a pre-existing VM, or a new VM is spun up for you on the fly. Your files are stored in some sort of persistent storage, and then mounted into the VM so that it appears that your files merely live in a subdirectory of your home directory. This is similar to what my own school does in our computing labs; when students log into our campus lab machines, they enter their username and password, but they really aren't logging into their own account; instead, they log into a shared account, and then a network drive for their files can be mounted.

So, here's the problem. Replit does a great job at keeping your files synchronized with persistent storage only if you edit those files with the on-screen editor. On the other hand, if you create or modify files at the command line in the Replit console window, those changes may not persist at all. Sometimes they do, and my understanding of the technology is not deep enough to explain when it will work and when it won't, but I can give concrete repeatable examples.

Here's one example: in a Bash repl, one might write a Python program to print data to a file. We do this regularly. In Intro, I had my students do a data analysis project, where they had to process some data and output a CSV file. They could then bring that CSV file into a Google Sheet in order to produce some graphs. So in Replit, the student can run the program and generate the CSV file, and the CSV file generated by their code will appear in the list of files in the browser on the left. But then when the student tries to download that CSV by using the "Download files" option in Replit, the file is missing from the zip file they receive. It's just not there. That's because there's some strange lack of synchronization going on between the local storage and the persistent storage. In order to resolve this problem, the student can click on the CSV file in the file browser to open the file in the Replit editor, and make an innocuous change. For example, the student can put a newline at the end, then take it out again. This is enough to wake up the synchronization procedure, and then the student can then download the CSV file. I don't understand well if this happens every time a Python program generates output, but I know it happens sometimes. I've had to walk students through this exact procedure, which is confusing and teaches bad habits such as editing automatically generated files.

But it sadly gets worse. In my PL class, I had students cloning repositories at GitHub using git at the command line in a Bash repl. (Yes, Replit has a lovely GUI Git interface, which I chose not to use. I'll come back to that below.) When a student decides to push changes to GitHub, they go through the usual process of doing a git commit, followed by a git push. Both of these operations, especially git commit, make considerable changes to the local git repository. A git repository has a .git directory within, and both of these commands make changes to files within that directory. But since these are changes made within the console window and not in the Replit GUI editor, these changes may not persist. This gets exceedingly ugly. Here is a scenario I saw play out many times:

A student commits and pushes their work, then leaves for dinner. The student returns later to keep on working, and ends up refreshing their browser because the connection with Replit has been dropped. As far as they can tell, everything looks great, so they keep on working. But they have no idea that the git repository has entirely forgotten about the commit and push that they previously did, because those changes to the .git directory didn't stick. At some later point, the student decides they want to commit and push to GitHub. The commit works fine, since as far as Git is concerned, the new commit includes all changes made since the dinner break but ALSO from before that, since Git forgot about the previous commit. But when they go to push, they get the classic "failed to push" error because GitHub has a more recent commit that the local repository has forgotten about. And then the student ends up needing to do a git pull followed by resolving a merge conflict, which nearly always results in me needing to assist the student because they don't understand why this happened.

But wait, there's more. Try executing the following commands in a Replit Bash repl:

mkdir temp
cd temp
git init .
git config --global user.email "you@example.com"
git config --global user.name "Your Name"
touch temp.txt
git add temp.txt
git commit -am "startup"

This all works great as expected, and if you type git log, you see something like

commit 7c942b1aab5cee56aeb8a3a95216d16ab31af29e (HEAD -> master)
Author: Your Name <you@example.com>
Date:   Thu Apr 8 22:18:54 2021 +0000

    startup

…. which is great.

But then if you just reload the Replit web page, and cd temp, followed by ls, the .git directory is gone! This is very scary.

However, if after you do the above operations you create any file in the GUI editor at all, and make an innocuous update to it (blank line, whatever), then the .git directory sticks. So this ended up being the workaround I had to instruct my students to do. "Whenever you issue a git command, make sure before you also change a file somewhere in the editor afterwards." Yuck. That mostly solved the problem, but students didn't always remember to do so, and then we had to clean up the mess.

This only matters if students are writing code that outputs to files, or if they are using command-line tools that make changes to files that need to persist. I think all of this is outside of the standard use case envisioned by Replit, which is why they probably haven't focused attention on it. But if you are doing this sort of thing with your students, this gets really challenging to work with.

Slowdowns, resets, and browser sync issues

The Replit team has been working heavily on this problem over the last year, and they have made some progress in making Replit more stable than it was. But it still remains the case that sometimes the repl just hangs, and the student (or me) has to reload the page. This seems at first to be just a minor annoyance, but there are some synchronization problems that cause deeper frustrations. I have seen situations where the student is running code in the terminal window, but it's running an outdated version of the code because the editor hasn't synced to the server properly. This also happens in multiplayer mode, where a partner makes a change but that change doesn't get synced properly to the other student, and so a student is seeing program execution behavior that doesn't match what they see in their editor. This does not happen frequently, but it happens reliably occasionally. Many of my colleagues has expressed concern about how this affects a student's mental model of a computer. For beginning programmers, it is important for them to at least be able to trust that the computer is running the code they entered as opposed to having to simultaneously worry about the possibility that the programming environment may be broken. With Replit, this sync issue happens frequently enough that we are concerned about the effect it has on how both we and our students view debugging.

To again put this in perspective, this issue just goes away when the student reloads the page in their browser. The problems arise when a student doesn't know that they are out of sync and should just reload.

User interface issues

I talked in my last blog posting about the fact that the terminal window has a dramatic accessibility problem in that you can't change the font size or the color settings. You can zoom in with the browser, but that causes other issues. This still has not been resolved.

Pair vs. Parallel programming issues

I use pair programming a lot in my courses. If I have an assignment that's supposed to be done in pairs, it's important to me that students do it in something approximating the traditional pair-programming model where they are working simultaneously on the same line of code. I'll contrast this with a style of programming I'll call parallel programming (is there a better name for this?) where students are coding at the same time, but on different aspects of the same program. Coding in parallel is a great way to get coding done faster — because now two people are entering in code in different parts of the program — but it entirely defeats my goals in pair programming. My goals aren't for students to divide a project in half and get it done faster, but rather to collaborate and have conversations about the code that they're writing. It's my belief, and research seems to confirm this, that at least in some cases this helps students learn. The divide and conquer approach, on the other hand, may help in building a larger project but doesn't offer the learning opportunities I want them to get out of pair programming.

It was my hope when we started this adventure last spring that multiplayer mode in Replit would facilitate pair programming, remotely. It allows students to collaborate simultaneously on code and see each other's changes in real-time, while they talk verbally on Zoom or other platform. And more often than not, this works really well.

The smack-to-the-head surprising problem is that Replit's interface can have the unintended side-effect of encouraging students to code in parallel, rather than doing pair-programming, just because it's so easy to do.

When doing pair-programming with the traditional approach in person, there's one screen (and usually only one keyboard and mouse). This makes parallel programming not really possible. Since students are staring at a single screen, they are forced into a mode where they can really only work on the same piece of code at the same time, likely with only one student "driving" at a time. In Replit multiplayer, on the other hand, there are no barriers at all to keep students from editing different files or different parts of the same file at the same time. Students are very busy, and they want to be productive like the rest of us. Some of them quickly transition from pair-programming to a parallel programming approach, which defeats the entire purpose for me of having a pair-programming assignment in the first place.

One could ask if this is really a Replit problem in particular. With the in-person approach, could students just choose to work on two separate computers, and combine their results afterwards? Sure. But it's pretty obvious that this would be a deviation from what I asked them to do, since I give a detailed description of what pair-programming is supposed to look like. Starting up a second computer is pretty clearly a direction I don't intend for them to do. But in Replit, the slide to parallel programming is so slippery. They can start by parallel editing the same line of code, such as when one student puts in a semicolon that the other is missing. (That's great, in my opinion.) But then they can start editing nearby lines of code, and before they know it, they're working in parallel. More to the point, it is dramatically less clear to the students that they're going against my intended working strategy. I can explain repeatedly what pair-programming is and isn't, but it's very challenging to keep them locked into an approach when the same tool offers a smooth continuum between one approach and the other.

In my course evaluations and in some focus groups we did, I asked students about how much of the time they spent doing pair-programming vs parallel coding. I'm aggregating over multiple surveys and a small focus group, so this number has to be taken with a massive grain of salt, but I'm going to wildly estimate that students end up using parallel programming rather than pair-programming around 30% of the time. That number is not a majority, but it's still rather big. When we do traditional pair-programming in person, my informal guesstimate is that it's much close to 5% of the time.

Many students even volunteered that they thought the ability to parallel program was a massive improvement over pair-programming because they got their work done more efficiently. (That may be true, but finishing the prompt efficiently wasn't my goal!) It was clear that despite my best efforts, they didn't really understand the point of pair-programming. This may have always been true, but Replit made it very easy for them to drift away from it.

So this is a long-winded point, but it's a really important one. For a sizeable minority of the coding that gets done in my classes, Replit doesn't actually facilitate pair-programming: it does the opposite and encourages students to avoid it. I am quite concerned about this, and I think that in future assignments if I continue to use Replit I will need to scale back on the quantity of paired assignments that I do outside of class. I think being even more explicit with students on my expectations would help, but it was also clear from conversations that some students were quite committed to the idea that if they were using Replit, they would continue coding in parallel.

Git and GitHub integration issues

Replit offers some nice features for working both with Git, and with GitHub. Within Replit, there's a nice looking and functional GUI interface to Git that allows the user to do basic Git operations. Likewise, GitHub Classroom offers the ability to add a "Work in Replit" button to GitHub Classroom projects. This allows the instructor to create a project in GitHub Classroom, and then the student can very easily bring it into Replit with the Git integration already wired up. The student can then use the Replit Git GUI operations with no other setup.

I use Git in my PL class, and so the above setup was what I used in Fall 2020. It was frustrating, however, for multiple reasons. The Git GUI in Replit works well, but it is not robust to Git errors. When the Git operation doesn't go as expected (e.g. one of the wacky odd Git situations we've all experienced), the GUI doesn't share the command-line error message with the user. In fact, it doesn't even indicate that anything went wrong. The student thinks everything worked, except for the fact that the commmit/push/whatever didn't happen. I found that multiple times through Fall 2020 I needed to walk students through using Git at the command-line instead to diagnose problems that the GUI approach wouldn't show. After this experience, these students nearly all abandoned using the Git GUI at that point, since they found that the command-line approach gave them more information, and they lost faith in the GUI version.

The second major problem was that in Fall 2020, GitHub Classroom (not Replit, I'm talking about GitHub Classroom now) was somewhat unreliable and somewhat unstable. The interface kept changing, and there were server issues on the GitHub side where the "Work in Replit" button wouldn't appear in a student repository for long periods of time after it was created. Apparently when GitHub Classroom forks a project it needs to add extra commits to a project in order to do add the "Work in Replit" button, and sometimes those automated commits could take minutes, or even an hour in some extreme cases, to occur. GitHub kept changing the user interface that the students saw to try to make it more clear what was going on, but I found this very challenging to work with. My instructions to students kept breaking, and there were just a variety of ways in which GitHub Classroom kept failing for us. Based on this and a variety of other problems we experienced in GitHub Classroom last fall, I've since switched to RepoBee for managing course Git repositories, which has been fabulous. (I'd strongly recommend RepoBee to anyone, it's a great tool and the developers have been fantastically responsive.) I used it very successfully in another non-Replit course that I taught in Winter 2021.

So for spring of 2021, I used RepoBee to manage the GitHub repositories for the students, and had students clone those repositories to Replit to work with. This was sadly messy. As mentioned earlier, I had the choice as to whether to encourage students to use the Replit Git GUI, or whether they should use Git at the command line in Replit. In the end, I could not make the GUI work. This was because the Replit Git GUI would only clone a private repository from GitHub if there were OAuth credentials established between the user's Replit account and GitHub. This is reasonable, and it's exactly what the mystical "Work in Replit" button established when I had been using GitHub Classroom. But without that magic button, I could not find any way to grant students OAuth authentication to their private repositories without giving them access to all of the repositories in the class, not only their own. This was obviously unacceptable, so I fell back to using the command line approach instead. That seemed to work, but I later discovered that this incurred all of the console-based file synchronization problems that I mentioned above.

So in short, my own experience is that Replit is very messy if you want to work with GitHub private repositories.

If your goal is to clone into Replit a public repository from GitHub, that works great with their Git GUI.
If your goal is to clone into Replit a private repository created with GitHub Classroom, which with you've enabled the "Work in Replit" button on the GitHub side, that also works great on the Replit side. But then you're stuck with the instabilities of GitHub Classroom.
If your goal is to clone into Replit a private repository that you've made yourself in GitHub, either manually or with another tool (e.g. RepoBee), Replit's authentication model either doesn't work or is confusing enough that I have been unable to succeed with it.

Wrapup and conclusions

The above is very long winded, though I hope some of the details are helpful to someone out there! Please let me know if they have been.

Where does this leave me going forward? I don't know. My section on Replit successes above is quite short regarding number of words, but that's largely because it's hard to give a long and detailed response on something that works. There really were some massive wins for me and my students in using Replit. The low barrier to entry, the fact that all my students were on the same platform, and the ability to use multiplayer for supporting the students were all pretty amazing. If I don't continue using Replit, I'm really going to miss those features, and I'm having a very hard time imagining giving those up.

On the other hand, the technical flaws that I address in detail are real, and resulted in many hours of pain and frustration for me and my students. What's most concerning to me is that I still don't really understand what's going on behind the scenes regarding the file synchronization issues, and that worries me a lot. I don't know if the next script I write or the next workflow I'll invent will fail for reasons out of my control. It was really frustrating to discover two weeks into spring term that the results from Git commands weren't persisting, and I really do worry about students not trusting the system that they work on.

We'll see where I end up!