An Interview with Josh Aresty, Tech Lead at Braintree

We talked about pairing, story acceptance, process change, adapting to remote work, and a neat technique I’d never heard of before for weekly planning.

Photo by Annie Spratt / Unsplash

Welcome back to Simpler Machines, a weekly newsletter about how to survive making software. I'm Nat Bennett – amateur anthropologist, insufferable hipster, and occasional software consultant.

I've got a real Halloween treat for you lined up today, but first, a housekeeping note: At the end of the month the first "season" of Simpler Machines will end, and the newsletter will go on hiatus for about ten weeks. I've been writing weekly for about six months, and it's time to take a break, take a breather, and figure out where this thing is going next.

You don't need to do anything to keep reading Simpler Machines. You'll get the next e-mail when it relaunches, sometime around the end of this year or the beginning of the next.

I want to keep writing, and a newsletter has been a good tool for keeping me doing that, so I'm starting a new one. This one's tentatively titled Orlando Furioso. It'll be a little more literary, a little weirder, and a lot less career-focused. There will be at least one review of a review of a Safeway deli counter, and at least one issue on folk art featuring our 44th President. It'll come out at least once a month and no more than once a week. Expect some photography, some media commentary, some introspection, and some surprises.

If that sounds like something you want in your inbox, hand over the ol' email address in the signup link below.

It's remarkably hard to get a handle on how real software product teams work. How does the team decide what to work on? How does the team define “done?” What stages does the work go through? What gets written down? What never gets written down? Is the team responsible for design? For deployment and operation? For measuring whether the work it’s doing is achieving the desired result?

When I ask questions like these, people often look at me the way the Enterprise crew looks at Data when he asks why something is funny.

“What do you mean, ‘how do we decide what to work on?’ We don’t do anything special.”

Sometimes, especially if they’re aware of my Pivotal background, they’ll add something like, “We don’t want to add too much process.”

Basically—“what the hell is water?”

I caught up with Josh Aresty the other day to ask him how the water is on his team at Braintree. We talked about pairing, story acceptance, process change, adapting to remote work, and a neat technique I’d never heard of before for weekly planning. This interview has been edited and condensed, but it's still nearly 10,000 words long, so get a fresh cup of coffee and a comfortable place to sit before you dive in.

Oh, and his team is hiring, so if the water he describes sounds like the right temperature for you, get in touch with him or his manager on LinkedIn.

Tell me a little bit about your team. What do you do? What’s interesting about it?

Sure, yeah. So my team owns the Braintree 3D Secure System. It’s a bit like 2-factor authentication for credit cards.

The interesting thing for me is not the technology itself, but the way our team works on it, which I think will be pretty familiar to people who have worked at Cloud Foundry actually, because I had a big hand in, you know, adapting the culture of the team.

We receive requests from front end SDKs that merchants integrate with, so we're not working on the outer edge. There's another team responsible primarily for the SDKs and that's Android, iOS, web. SDKs. And sometimes we do make changes to them to add things, to add features to them. But generally, those front end SDKs are maintained by another team.

Then the requests come into the gateway, which is a Ruby application, which adapts the requests, and sometimes rejects them, you know, does our own business logic, and then sends those requests to our vendor to get a response and then sends it back in a nice format for the merchant.

So what does a typical story or work item look like? What are some of the things that are in your backlog?

So the biggest thing in our backlog right now is a really major undertaking to adapt from one version of 3DS, 3DS 1, to another version of 3DS, 3DS 2. This transition has been 10 years coming or something like that. There are regulations that drive our timing, and the compliance deadlines keep getting pushed back, but they're finally deciding October this year is the deadline.

The biggest problem in moving to 3DS 2 is that 3DS 2 requires more configuration than 3DS 1. When our team initially designed the system, we didn't do a very good job of future proofing our design. And part of that was we gave required configuration at our vendor for every merchant. So in order to meet the deadline, we prioritized a bunch of work to basically smoothly transition from one style of relying on the data configured at the vendor, to sending the data that we need with every request. We had to do that transition without downloading the data from them. And without having any impact on our merchants. So yeah, those two things. Those two requirements have driven a lot of the backlog in the past couple of months.

You're doing what I think of as the classic back end software application engineering of like, you’ve got two systems, you need to get data from one system into another system. And you’ve got to do stuff in between and it's got to be correct and it's got to happen at a certain speed.

Get data from here to there. Yeah. And try to make the most money we can for merchants using us.

Yeah, yeah. So, we wanted to talk about practices. How do you work together? What's the same as with Cloud Foundry? What's different?

So I'd say a lot is gonna feel really familiar to someone who worked on Cloud Foundry. It does vary from team to team on our team, we’re using VMs in the cloud with a Braintree customized configuration. Our team mostly uses just vim and tmux, for pretty much everything that we do.

We use JIRA, but we use it in a way kind of making it feel like Tracker. So stories, we have a single backlog for the team prioritized, we move things from “to do” to “in progress” to “done” -- well -- to acceptable and then to complete. We do use pull requests, but mostly we use them to run tests because they will automatically run when you do the PR. So there's a blocked column for things that are in PR, or merged but not deployed yet.

We have a twice-weekly deploy, but our team doesn't really usually have to be involved in the deploy process. There are other people in Braintree who do that for us.

And then we do a daily stand up, we use Pair.ist. We rotate pairs daily. And we pair basically full time. We do allow for people to request if they want to solo or, or pair with a specific person every day. And to say, “I want to hold context on a story,” or what have you.

One thing that is pretty different here is that, because the product manager and the engineering manager are in a lot of meetings, the engineers mostly do the acceptance. Because we want to accept stories, I feel like that's really important to catch deployment bugs and make sure that we didn't make too many assumptions in you know, in mocking and things like that, because we are a Ruby code base, so you can make everything fake. And then sometimes you discover bugs when you try to use it.

And for the stories, we mostly are using gherkin. So the “given, when, then,” kind of stories. I'm still teaching people how to write stories that are behavior based. So right now, a lot of our stories I've written.

What's the value of story acceptance to you?

Because when you write a story that way, by specifying the behavior you want and how you’re going to accept it, there's a few things that it allows.

One is it allows the engineers to be creative about the how, without affecting the what. I may not know the best way to do something. I might have a certain idea of the way it should be done, but when I'm writing that story, I'm not really deep in it. I'm trying to think of it from a customer perspective, or maybe it's not always a customer, it could be a system, like the external system needs this change.

It's really easy to describe the change that you want to make, if you think of it from that perspective. But when you start to try and describe the details, it can be really hard.

The other thing it allows is that when you go to accept a story, if you've written it from a behavioral perspective, you can actually see the change. When you write from a lower level perspective, a lot of times it's impossible to even verify that it was done, because at that point it might just be reading like, “Okay, well, does this call that? I think so?” It's basically a code review. It’s not the same level of verification.

Okay, yeah, I'm so rarely on that side of story writing that I'm not even thinking about that. But what you're basically saying is that behavioral stories make the product management work itself easier. It makes it easier to describe what you want, and it makes it easier to check that you've gotten what you want, because you're not also thinking about implementation. And then on the implementer side, you get higher quality work because people are bringing their own complexity and knowledge and creativity. And it makes the story, honestly, more fun and more interesting to work on because there's still some stuff to figure out.

Yeah, and then the last point is that when you write a story that way, you show people that you don't care about how they implement it, which does require some time, right? To teach people. But if you do those two things, people stop asking questions. You no longer need the meeting to say, “this is what I want.” Because you've already said what you want, and they trust it.

What kinds of questions people ask, in the absence of that?

Sometimes they're trying to understand what you want, and sometimes they're trying to understand how you want them to do it. I really wanted to run the team in a collaborative way.

And so for about two months, when I first started as the tech lead, I was running the team in a very democratic way, trying to get everybody to participate in all the planning, but then I was also the final word on pretty much every question. And during that time, I was asked a lot of questions about, you know, everything.

Because I think people were used to more involvement in the implementation details from the previous tech lead. And I was really trying to not be the same kind of tech lead. I didn't want to run things that way. I wanted people to feel free to do things the way that made sense to them, but also make sure that we got the behavior that we wanted.

I've done a lot of my best work, going, “Oh, I don't want to be that.” Like, “I don't know how to do this. But I know how I don't want to do it. So I'm just gonna try something else.” That can be a really strong motivation.

Absolutely.

I have more questions about your practices. How fast do the tests run? Or-- let me ask it a different way. How long does it take for your tests to run?

So that's a good question. I don't have a great answer for that. So our tests are, I'd say, about half an hour to 40 minutes to run on a PR from beginning to end. This is an artifact of the fact that we have a really big legacy kind of application that we maintain.

So that’s the full test suite, all the bells and whistles?

That's the longest tests, which is basically only unit tests. The integration tests are not as long actually. They don’t run on every PR though, so you have to actually request them.

I tend to think, not in terms of a single test suite run, but in terms of the feedback loop for a whole feature. I'm always trying to make it faster. It's not a single test run, it's actually from writing the code to accepting a story.

So right now I'm really working on how to get the team to accept stories as fast as they are delivered. One of the challenges is, when we started doing acceptance, we made a decision to put acceptance in a separate column. And so people were in the habit of pulling off of “to do,” not the acceptance column. And I have to kind of keep on making sure people accept stories as soon as we can. So that's where my focus has been recently, not as much on the test suite, specifically, that and writing stories in a way to elicit the feedback that we need.

Some of the most productive discoveries we’ve had have been really interesting explorations for me, but not always writing code. It can be exploring the edges of the API that you're working with, like I treat our vendor as kind of a black box. Their documentation is not, it's not perfect.

So some of the more interesting work that we did to figure out what would happen when we requested a specific 3DS version was basically just taking all of the possible scenarios, configuration possibilities, from the UI on that side, and then sending a request and seeing what we got as a result. And we documented that on a wiki page. And then that kind of exploration is how we have confidence in moving to this 3DS version.

Last week, we flipped something like 75% of our traffic, from one style of doing a lookup request to sending the data on every request, with no merchant in balance. And no observable change. That was a huge win. Everybody, you know, it was all hands on deck. Everybody went to the meeting and watched. We were all anticipating something going wrong. And nothing happened. And that felt pretty good.

Okay, so before we leave tests, is there anybody on our team who's practicing test driven development? Is that possible on your team?

Oh, yeah. So we generally do try to practice test driven development. We will sometimes spike out the change we want to make, and then comment everything out and then write the tests. We try to, to the extent that we can make sure that we have a failing test for every feature, for every condition that we add to the code.

So how do you-- you run some subset of the tests? The tests take like 40 minutes to run if you're running everything. What’s your fast loop look like?

Basically, we will identify one test file, or one spec, that will elicit exactly the change that we're trying to make. So we tend to loop on running that test or subset of tests. And a lot of times, I'll use, rspec has a flag “next failure,” which will fail on the first failure. I'll use that sometimes to try and get a really tight feedback loop, when I'm working on a feature. I also use a tool called entr. Which can basically monitor for files being saved and then run a command whenever the files change.

And so those two in combination, make it pretty fast. I've also done some experimentation with-- our tests take a while to load because it's a really big codebase. I've experimented with running tests in the pry console, which can sometimes make things a little faster, but it also has the issue that it's a little more finicky than running it from scratch.

So another question that I know that some of the folks that I will be showing this to are going to ask, when do y'all solo? Like, how do you decide? You mostly pair, so how do you decide when not to pair?

Soloing is really decided at the individual level. On a given day, somebody says, “I really want to solo today,” or “I want to solo this week.” That usually happens because of personal reasons. People live in different spaces, they're working from home, sometimes there'll be construction going on in their apartment. And sometimes you have appointments, contractors, doctor's appointments, whatever, if for whatever reason, you feel like you're going to be bad as a pair, sometimes people ask to solo. I tend to just pair even if I do have those things, because as a tech lead, I have a lot of interruptions anyway.

Did you introduce pairing on the team? Were they pairing before? What did that process look like?

So they were pairing before, but when I started they were pairing for a week. And they used an Excel spreadsheet to track pairing so that you could plan. So I introduced a more fast rotation, using Pairist. And I introduced the practice of asking to solo, versus asking to pair, in order to introduce the fast pair rotation. It was kind of a requirement that people could ask for what they want. People weren't sure that fast pair rotation would actually work. They were worried about that. I also introduced the story format that we have, the gherkin format, I introduced the single backlog, and I introduced acceptance, which were things the team wasn't really doing before. I think that's it.

So what were the concerns about switching to the faster apparent rotation?

There was a concern that people wouldn't be able to build the necessary context in a completed story. Which can be true if you have big stories, right? If you have a really large story, it’s hard to hand that context off.

Yeah. Yeah, I've definitely been on stories where it's like, the story runs for like three, four weeks. And every time you change the pair, you're just redoing that first initial startup work of like, “What is happening here? Why is it doing this?”

Yeah, especially when you don't have an anchor, someone anchoring that story can be tricky when those stories are long running. People couldn't really imagine stories that were small enough, where somebody could hand it off, and then carry it for the next day. And I think that that's taken them a little bit of time to demonstrate that it works.

Yeah, so those two practices are interrelated, getting the story size down and rotating pairs regularly.

Yeah, and also keeping really good notes, in stories about what has been done is another part of that. I think, at this point, most of our stories are small enough. But we're also trying to make it so-- the edge of the process is making it so that I'm not the only one who can write stories. I'm trying to start so to get the rest of the team up to speed on how I think about story writing. I’m trying to democratize that, to give more of that to our PM who isn’t used to writing stories that way. And also to the other engineers, who sometimes participate in the story of the project planning

We’ve introduced that also fairly recently, the project planning for engineers, which we've introduced as a role in Pairist, to try and give an opportunity for engineers to work on story breakdown, writing stories, writing documents, we are in an enterprise so they sometimes need documents. And then to give the pm someone to talk to to ask about that stuff. Those are things we're still evolving.

One thing that I want to make sure that we have some time to talk about is, how does remote work on your team? What were you doing before the pandemic? You’re in Sacramento, so were you like remote before?

Yeah, so I was remote from the beginning at Braintree. There were some people who were in the office at AT&T park. At the beginning, the plan was I was gonna go in once a quarter and stay in San Francisco for a week. That stopped after two quarters. Because the second one was right before the pandemic shut everything down. And then everyone was remote.

And I had sort of the opposite experience that I think a lot of people had, most people felt disconnected. After the pandemic started, I felt like it connected me to everybody, because now everybody's in the same boat as me. So since then, our team has actually gotten more distributed. For the most part I think that’s happened just because we got different people, not because the people moved. But now we have one person, our manager is in, I forget the name of the town but near Oakley, so far East Bay. We have one person in San Francisco, one person in San Jose, one person near Hayward. I'm in Sacramento. And then one person in Seattle, and one person in Ireland, and then two contractors in Poland.

We're proud of being completely distributed. We're also pretty international. One person moved here from India and another from Nepal. You know, the two folks from Poland. We had someone from Japan here as well, for a while. We used to pair in Japanese, which was fun.

So how has the team changed? How did the team’s practices change now that people are all fully on remote?

I can't speak too much to how it has changed. Because I really just observed the transition, not whatever it was like before. But we've definitely had to adapt more, and more to the things you can't take for granted, right?

We use Slack a lot. We don't rely on email too much on my team, although people ask us stuff on email. Because of pairing we do a lot of video conferencing, or just audio conferencing. Not everybody likes to turn their camera on. I try to encourage it. But some people live in small apartments where it's not reasonable to do that all the time.

But the biggest thing I do is, I created a vanity URL for my teams. And that is something the team relies on heavily to just join an ad hoc discussion. [So there’s a video conference there that the team can join if we need to talk about something.]

And then the other big thing that I've done is that I use a virtual camera.

So I can kind of switch my screen around and do things like, I have a fake window in the background if I want it to look nice. But the nice thing about the virtual camera is that you can ad hoc share things. And it feels a lot like being in person. It's the same tool that a lot of game streamers use, OBS Studio.

So it lets you get that like, “Oh, hey, come over here and look at something” experience. That’s this trivial thing to do in person that you've had to figure out how to do a little bit more deliberately. That turned into a technical problem that you had to solve.

Yes, yeah, I solved a lot of things with technical solutions. But the white-boarding experience is a little harder to do, I use a lot of lightweight things, and I focus on facilitation, over drawing.

We have one meeting, which I'm calling pre-IPM, it's focused on the why, and the what, what we should work on. And rather than starting from a spreadsheet that's already defined, I start from an empty vim buffer every time and I elicit from the group you know, what are all the projects that we should be aware of? And what’s the next step for each one of them. And then I order them.

I do that intentionally, because I want it to be a blank sheet. I want people not to come in and think, “okay, I already know what I'm working on, I don't need to think.”

So you're kind of refreshing everyone's awareness of what team has in the backlog every week. I like that.

Yeah, that was a little controversial. There were some people who were like, why don’t we just use the spreadsheet?

How did you overcome that?

Well, it's funny, I kind of... I was in power?

Okay. [laughs] Yeah.

Yeah. It was kind of like, I don't have an alternative. I mean, I explained pretty much exactly what I just said to you about why I thought this was important to me and when the new EM and the new PM joined we were already doing that. They’ve never seen anything different.

I think really good facilitation goes a long way. When I tried to hand the facilitation of that meeting off to other people, that was when most of the controversy came because other people didn't facilitate it the same way. So I’m not sure how to turn that one off because it feels like it's delicate. It’s hard to facilitate well.

Yeah. Okay, so I want there's one more question that I want to make sure that I ask because this is a little bit like the pre-interview conversation, like, “why would I join this team?” And there’s one question I always ask.

What's the difference between somebody who's good on your team and somebody who's really great? What makes the difference between “they're solid team member,” and then, “this is my ideal hire.”

Yeah, that's a really good question. So for me the most important quality for someone on my team would be... I guess I can think of two.

First, it’s patience. And the reason I say patience is because we work in finance, right? So there’s a lot of legacy systems that we have to integrate with, that we have to deal with. There are people working in companies which just don't have the experience of what it's like to work in a really fast paced company that is very modern. And so for them, a lot of times it's meeting them where they are at. It’s saying, “Okay, I understand that you're not used to delivering stuff every week. But you may not understand why I'm trying to get this feedback loop closed in this way. So you try to find what's important, be patient, ask for it, and be kind. And that's to me, that's the most important thing.

And of course, when you’re pairing, empathy is really important as well.

Yeah, patience is an underrated quality when you're doing the kind of work that you're doing.

Yeah, patience and resilience are kind of, you know, peers. Maybe resilience is the cooler way to talk about that these days.

Sometimes the most effective thing is not to make your tests faster. It's like, you got to be creative about what is the most effective thing to do in the whole system that you're in. The whole process, including the externalities, including the vendors, including the merchants.

Right, you have to consider the whole system and not just the little piece of it that is right in front of you, or the easiest for you personally to change. All right. I want to be conscious of time. I could talk about this forever. Is there anything else that you definitely want to make sure to cover? Like, is there anything I haven't asked that you really wanted to talk about?

Oh, yeah, this is the question that probably a lot of people are gonna have. Which is about, what what has changed? You know, as far as like, Braintree and PayPal.

I've actually worked on both sides, as you probably remember, when I was at Pivotal I was really at Dell, right? So I've been in the big company sort of acquiring the smaller company. Now I've been in a small company being acquired by a big company, which gave me some empathy for a person who joined our team coming from PayPal. At first, I was wondering how he might fit into the team. For example, would he like pairing, which we do all the time?”

Right? Are they gonna come in and change everything?

Yeah, I was a bit suspicious and worried at first.

He turned out to be-- he's great, great pair, great engineer. I’m very happy that he joined our team.

I think that there's naturally a lot of suspicion of a big company, when you're in a small company that has been acquired, and when things start to change around you.

But one of the good things that I can say about working for PayPal is that, on the whole, they seem to be trying to do good things for entrepreneurs around the world.

So I’ve seen the CEO of the company appear and a lot of different things ranging from like TED talks to just the company meetings, where he's shown that he really is trying to value every employee, from the highest paid people to the lowest paid people, and make sure that they get into a good place. The company is trying to offer programs to help social causes that people care about.

There’s security, working for a big company, it seems pretty stable. There's good work-life balance on our team, we work nine to five. We basically don't work outside of those, and if I do get on Slack, outside of those hours, people say, “hey, you shouldn't do that.” So that's a nice thing.

We have seen a lot of people who had been at Braintree for a really long time, decided to move on in the past year and a half, something like that. But I think that that was sort of inevitable, I think that that was going to happen. Because at some point, those people wanted to move on and try something different.

PayPal keeps a pretty hands off approach to the teams working here. And my manager actually came from PayPal. He has been there for a while. So he understands a lot more about the company than I do. And he seems to be pretty confident that we're going to be able to maintain the culture on our team that we want. So that's one of the reasons-- I hired him. So I actually picked the manager

Nice. What were you looking for in a manager?

I was looking for someone who was, was not like, not super technical? Mostly looking for good social qualities. Kindness, sensitivity, listening, ability to be tough when he needs to be. I was looking for those things as well.

Someone who would help protect us from pressure, outside pressure, when we need that, and I think we found someone who can do that. But also, someone who can help us understand and integrate with the bigger company, because our team is going to have to be involved with PayPal. I mean, we are part of the big company. And so we have to find ways to work with them, and address cross cutting concerns.

Jobs

The team at Braintree that Josh works on is hiring. Instead of applying for the generic Braintree job posting, message him or his manager on LinkedIn.

Code for America has a union now! They continue to deserve the top spot on my jobs page.

If you enjoyed this interview, forward it to a friend, or share the it on your social media of choice. I appreciate it. And if you'd like to sit down with me for a similar conversation about how your team works, send me an e-mail.