Knowledge recall and popularity based search

This is a preliminary draft - see avaitla16.com/how-i-write

You join a new company. It’s a “startup”. There’s no documentation. Your job is to get stuff done asap since the clock is ticking (money doesn’t grow on trees - and right now it’s on fire).

This is a common scenario folks may encounter. With a small team things move lighting quick. Everyone knows everything - alignment and priority happens almost telepathically. There’s no need for management overhead since the focus is crystal clear and there’s simply work to be done - pick up a shovel and dig. Ship it. This agility is what allows the startup to beat the incumbent.

Then the bill comes due and things begin to slow down. This happens for many reasons. One reason is a flood of demand and the requirement for stability by customers - congrats you are now the incumbent. Another is the communication overhead from growing a team. The team is the core of innovation - they conceive and build new things for customers as well and continue to nurture and sustain existing lines of business. However, in order to be successful, a team needs to be understand and be onboarded to the hard earned secrets that a founding team has accumulated over years of listening to customers themselves.

You can only delay the agility slowdown in a variety of ways. Growing a team slowly or not at all is one way. This is viable for some businessed but not all. Certainly not those where customers demand continuous innovation. When growing then, we can embrace the reality that process and effective onboarding is a requirement and face this need head on. Our goal is to pass on the knowledge of the past to those who are new so when they come to a critical juncture of the business they make the right decisions.

This requires documentation. We document the expectations in the workplace. How the code is run. What service calls which. Important acronyms and terminology. Coding conventions. Who is responsible for what. The names we call our services.

And then it strikes us - there is simply too much documentation. Some of it is stale. Some of it is duplicated and only half accurate. Some of this inaccurate information is held as gospel as new comers are too shy to ask if this still the way things work. Some of this is coded in business specific words requiring a wikipedia rabbit hole of research to decipher. We decide we need to throw it all away and start over.

Documentation is only one side of the problem for successful knowledge transfer. The other side of is one of recall. Do you still remember where that document on test users was located? Which nested folder contained that contact information for the vendor? How did we load customers into salesforce again? Where was that video on uploading a replacement verification file? Who do I talk to again about a document that doesn’t link anywhere?

All of this information exists but if you can’t find it it isn’t much use. If it’s out of date and inaccurate it may be harmful as it is passed along. We begin to realize that documentation doesn’t live on its own and we need someone to help us wade through this nuance and recall where things are.

We may address this in some of the following ways:

  • Hire a dedicated knowledge steward (this may be the head of people) who’s role is to ensure an effective onboarding curriculum is established and documentation is kept up to date

  • Keep existing documentation in the systems they’re currently in (Google docs, confluence, slack), but have a way to index and monitor them. It’s an uphill battle to ask everyone to stop what they’re doing and engage in a documentation crusade.

  • Have a way to observe popularity, deprecate old items, and periodically verify items in a knowledge base.

There are now many tools out that can indeed help with this like glean or guru. They use this popularity based idea so you can see what documents are important based on who has accessed it, what documents are edited most recently, as well as by having owners verify and deprecate information periodically.

Popularity based knowledge and onboarding is a powerful idea. Let’s take it one step further. Imagine seeing a codebase with millions of lines and thousands of files. You don’t understand this by studying file by file. You learn it in two ways: doing a task that undergoes code review (having a buddy coach you based on your attempts to learn the system) and by studying the code paths that are “hot” - or those that are updated recently. The scripts/old_migration.py file is less important than a core login endpoint. Imagine a tool that visualized files based on number of git history over time. Moreover, based on live traffic being able to see what path endpoints were the most active. That gives you a quick way to focus on the most important aspects of the code first.

Let’s know understand how you might understand a new domain in a company. Say you work in engineering but want to know more about how support teams work. You can reach out and ask for an overview and onboarding guides. In addition you can examine the most commonly used and viewed files in the google drive - to catch all the details they may have forgotten to mention. It’s the closest thing to shadowing them to learn what the real day to day looks like which clarifies the nuance they miss by an explanation alone.

This concept of popularity also exists in the data discovery world. In the beginning data engineers would document tables/columns/queries in a static data catalog. This would work in the beginning but suffers from staleness due to forgetting to keep it up to date as the business changes (columns get deprecated, usage patterns change, new dashboards are built by other teams). Automated discovery is addressing this by using live usage patterns in bigquery access logs to pinpoint which tables are crucial based on the reporting dashbords they power. If the CEO is looking at something every morning at 9am those tables are probably important and have the information you’re looking for.

Another more relatable example to the everyday is cooking a meal. You have all the ingredients and tools in the kitchen (all the documentation exists), but do you ever forget where they all are exactly. Where was the oven mitt again? How about the shears? And where on earth is my favorite knife. You can solve this with meticulous organization and labeling, and to solve finding things with popularity you could imagine having drawers based on popularity. When you use something you put it in the “most frequently” used drawer and naturally push things down into the “least frequently” used one. But I don’t know anyone who does that. There are those I know who simply assume it is gone and buy a new one of each (the burn it all down and start over approach to knowledge bases).

The best example of solving the recall problem through popularity (or relevancy) is google search itself. All the worlds knowledge is documented - but that doesn’t matter if you can’t find / recall it. Search helps you recall that documentation based on what’s popular (backlinks), verified, and through manual curation by a caring team of googlers.

Hopefully these thoughts on onboarding are helpful to you! Good luck!