Today I am launching Multumesc.org, a non-profit site which informs the community about the absence rate and absence patterns of the members of Parliament from Romania. This project spanned over 7-8 weekends for me and some other contributors, including research, data analysis, coding, testing, automation and UI.
As this was a personal pet project for us, we could not afford to invest a whole lot of time and resources. On the other hand, the way the public and social media would respond to our initiative was highly unpredictable: we could very well have very small traffic (1000 users per day) and then burst out, briefly and immediately to 10K users for an hour or two, should a newspaper or online magazine pick up the news.
Additionally, we had two pieces of information which helped in our designȘ
- The information from Multumesc.org would be updated automatically (by the data analysis component)
- The information would be updated infrequently (once every 24-48h)
Thus we decided to make all of the website content static, with the attendance data for the deputies updated in the background, in a periodic batch task. This decision allowed us to deliver an automatically scalable minimal viable product quickly, without spending another 4-6 weekends thinking about automatically deploying and scaling web servers.
Find below the diagram of our system:
For the dynamic parts of the website (user feedback, comments, messages and analytics), we decided to use external components from Facebook and Google Analytics.
So as to make the experience interactive for the users, we have used jQuery components for dynamic tables and charts, as well as for displaying deputy profiles. This way, all the necessary data is downloaded statically into the browser when the page loads, subsequently serving as small lightning-fast in-memory database for search queries.
As a conclusion, considering the actual constraints and opportunities of a project (i.e. infrequently updated data) can significantly reduce the complexity of a project. Also, with current cloud services, designing a scalable MVP becomes significantly more manageable in weekends.
Last night I started watching a TV series called Mr. Robot. It’s centered around this introvert hacker who bands together with a group of digital anarchists to bring down the stronghold that the corporate-financial system has on the world. Money means slavery, so lack thereof must mean freedom. Seems legit, right?
I have seen a similar concept in the movie called In Time, only there the currency is not the dollar, euro, pound or yen, but the second. Upon coming of age, everyone got a time budget which they got to invest, spend, and save. Going under no longer meant one could file for personal bankruptcy, but that one become worm food. Of course, the rich were practically immortal and the poor died at 25 (not unlike the real world). And unsurprisingly, Justin Timberlake and 13 save the day by busting into time banks and redistributing wealth equally. How predictably Robin Hood-ish!
From the time of The Great Depression and Bonnie and Clyde, the concept of wealth redistribution became repeatedly and arguably increasingly seductive to the capitalist world, especially in times of recession when the lower and middle classes struggled to stay afloat or even alive. And it becomes a popular concept that partial or complete wealth and/or income redistribution is the answer to society’s problems. Of course, the blind majority who becomes fond of these concepts bears no thought towards its disastrous previous implementations in Soviet Russia, Nazi Germany and throughout the communist block.
There is no argument that income inequality in the world and even in the so-called civilized world has reached unprecedented and borderline dysfunctional heights.
However, no amount of frustration with the status quo of the financial system or the economy justifies the toppling of corporations or the meltdown of financial markets as the right solution. The problem with TV or cinema production such as Mr. Robot or In Time is that they encourage the simplistic, almost idiotic belief that the opposite or the absence of a sub-optimal, corrupted system (such as state-controlled currency) is necessarily better or even the best possible solution.
Injecting such ideas into the public mind severs as some sort of psychological pressure-valve or mental pain-killer:
“Oh those corporate fat cats hold the world enslaved in debt for their perpetual profit! The solution is to take them down, together with the credit markets and currency exchanges.”
Now I understand why it is tempting to start thinking that some Robin Hood hacker will cyber-attack financial data centers and wipe away everyone’s debt. It’s tempting because it is an easy fantasy. It’s a happy place to imagine for the masses maxed out on their credit cards.
It is also stupid.
Because wiping away someone’s debt also means wiping away someone else asset or investment. And the indiscriminate nature of the word “wiping” makes no guarantee that the super-rich would suffer and the less-fortunate would profit. Quite the opposite, wiping away your debt would also mean wiping away someone’s uncle deposit or some poor couple’s savings for a postponed honeymoon.
The exact of kind shortsightedness that makes people think that taking down the banking systems means wealth for the poor, punishment for the wealthy and justice for all is the mentality that makes the rich want to be richer at any cost. The truth is that each of us will be biased in favor of the economic or political systems which favors our class the most in this moment. But not having perspective on the long run and the perspective to see other people’s situations/context is both the origin of some’s sinking in debt and others’ being suffocated by greed.
The opposite of being controlled (i.e. by the government, financial systems, big corporations and so on) is not being free. It’s chaos.
The opposite of income inequality isn’t sudden wealth redistribution. It’s improving the dynamics of future income, so that the financial bloodstream is directed to the vital parts of the system, which create long-term value and well-being.
And most definitely, the answer to a corrupt system isn’t taking down the system and rolling society back to the Wild West or the Middle Ages. That would be like saying the treatment for a sick patient is euthanasia. The less thrilling, unexciting, somewhat boring, hard, yet correct solution is diagnosing the system and improving it from where it stands, with lessons learned from where it has been.
Concretely, in spite of the evil corporations and corrupt politicians and incompetent governments, on an individual scale we can try:
- Learning how the world actually works
- Learning discipline (of savings, managing our costs and limiting spending)
- Teaching others discipline and alternative perspectives
- Not wasting too much time with worthless entertainment
- Focusing at least once in a while at what we can change about ourselves instead of complaining about others
- Optimizing what we give back to society (as a long term investment, as opposed to thinking about what we can profit from on the short term)
- Leaving some buffers/reserves for later (time buffers, money buffers, attention span buffers)
- Understanding how and why things work like they do (including banks, currency, governments)
- Avoiding debt instead of fantasizing about having some divine force relieve us from it
Of course, most of these hands-on, step-by-step, do-it-yourself, suffer-now-profit-later approaches are not by far interesting enough for a movie, let alone 12 episodes of a TV series. Hacking banking servers, running denial-of-service attacks on the stock markets, being a modern day Robin Hood armed with a Mac and shielded with hooding, plunging the world back into chaos one stock at a time – YEAH!, that’s almost as entertaining as this reality show.
As a conclusion, I reserve the right not be impressed by the whole FSociety concept. Because usually, “screw everything” is not a better solution that any solution.
Also, optima are rarely found at extremes.
In the world of startups and MVPs, you’ll often hear the question “Is your start-up a vitamin pill or a painkiller?” The argument goes that whatever you’re working on should alleviate a real world pain, thus it should be a painkiller, rather than a less effective vitamin. That metaphor, however, if fundamentally flawed and shallow.
Just like a lot of start-ups, painkillers do not treat the cause, but the effect. They don’t really solve a problem, they just hide it – creating undesired side-effects in the process. And even some of the most successful startups don’t really improve the quality of our lives. Most just attempt riding the wave of trend and hype to reach the next round of financing or the mirage of that big, fat exit paycheck.
The data store variety and landscape today is huge and can be confusing. To get a grip on all the various solutions out there, I find the map below (courtesy of 451research.com) very helpful when making the long list of potential candidates for a project. It lets you make sure you that don’t leave out anything relevant from your selection process and that you are, quite literally, on the right track.
For the very high level purpose of your project, look up the line and list all the data stores on that line. Research each one against your project-specific requirements. Finally, make a short list with the ones which seem to be fit for purpose.
As discussed in the previous articles, the first step in the process of finding the right solution for a data store is having an in-depth understanding of the fundamentals problem at hand and of the business scenario(s) which it will serve. In what follows, I will expose some of the question that I consider an enterprise/solutions architect should know the answers to before proposing either a new data store or a data store replacement. After all, as stated by the CAP theorem, you can’t have all qualities in a data store, so you need to carefully pick the right tool for the job.
- Volumetry: What is the total size of the data store? It’s not important to get an exact value, but rather an order of magnitude (10GB, 200GB, 1TB, 20TB). Instead of concerning yourself with the exact value, it’s better to focus on the growth factor you expect year-over-year (is it 10%, 50%, 200% or 1000%). Depending on the volumetry and on its growth, you might be forced to opt for a scalable/distrbuted data store (which runs on several nodes).
- Atomic size: How many records (items, objects) are processed/retrieved/in any way touched by one query? Also, you need to focus on the order of magnitude, not on the exact value. Are you planning to retrieve/process/compute over up to 10 records in a query (this would be the case for transactional workloads, like updating customer data, records), is it more like 10K records (usually in short-term reporting and analytics workloads) or it is more like 10-100M records touched by each query (characteristic of an analytical data store or data warehouse, use for building more complex long-term reports)?
- Load: How many queries do you expect per second, on average and in spikes? Are we talking 10 operations/second, 1K operations/second or 100K operations/second? Depending on the load and on its growth, you might be forced to opt for a scalable/distrbuted data stores (which runs on several nodes).
- Responsiveness: How fast do you expect those queries to run? In some instances, you may need 1-2ms response time (real time systems), other scenarios might be OK with 50-500 ms (displaying, generating content) and other scenarios might be satisfactory to run in 1-60 seconds (usually analytical workloads, generating a complex report of all items sold in the last six months per geographic region and line of business)
- Immutability. Does your data ever change after you add it? For instance, if you’re storing a log of events (page views, user actions, application errors/warnings), it’s unlikely you ever want to change a particular data. And this assumption does wonders in terms of allowing you to choose a class of data stores that are fast, scalable, capable of running complex queries, but which are pretty averse to changes: column (or columnar) data stores / data warehouses. Of course, this does not mean that you cannot change data once it’s stored – it just means that changing comes with a big performance hit (i.e. not what the tool is built for). Note that for columnar data stores there are a lot of strategies of selectively deleting old data without denting performance (i.e. destroy data which is older than 24 months).
- Strict consistency. There are cases when you want all queries to the data store to receive the exact same result (assuming nothing changed between queries). In case you are running your data store on a single layer and on a single node (like, you know, MySQL) this is almost never an issue. If you are running the data base on distributed nodes (and all or some of the data is replicated), some nodes may get the updated version later than other, therefore they might give out different answers than the master node, at least until they get the update. Therefore, you may want the data store to be able to guarantee you the fact that all replicas have been updated before confirming the change (consensum) or that at least 2,3, n/2, n/2+1 (quorum) of the replicas received the update.
- Date Freshness/Staleness: How fresh do you expect the data to be? In order to scale, you may want to maintain copies of the master data. This means that when something is added or when something changes, it takes some times for all the copies (replicas) to be updated. Is this acceptable? And if so, would 10ms be ok, would you be OK with 1 second? Or would even 1-5 minutes be satisfactory? For instance, when reading and writing banking transactions, any sort of staleness is unacceptable (since it can raise risks of double spending). However, if running a content site, having an article or a picture refresh from the user’s perspective 5 or 10 seconds after if was updated by the content manager is pretty much OK. Going further, if you create a sales report for the last 6 months, it may even be acceptable that the data from the last hour (or even the last day) is not included (or is not guaranteed to be entirely accurate).
- Transactional ACID compliance. ACID stands for Atomicity-Consistency-Integrity-Durability it basically refers to the fact that transactions (groups of separate changes) either succeed together or fail together (while preserving the previous state in case of failure). This might be the case for bank statements, customer orders and online payments, but transactional compliance is most probably NOT needed for reporting, content delivery and ad delivery, tracking analytics.
- Query accuracy. For certain analytics tasks (i.e. number of unique users), especially for real time queries, having the absolute exact value is not a absolute necessity. If having 1-2% error in acceptable, you can consider using sketch techniques for approximate query processing, which make your systems run faster, with less resources/lower costs, while only guaranteeing the results with a specified error threshold (of course, less error->more processing->more time/more resources). Simple examples of approximate query processing include linear counter and LogLog counters. These methods of doing fast estimates for problems which are expensive to evaluate accurately rely on the less-popular probabilistic data structures. Data stores usually don’t have built-in support for this, but you can implement it in the application layer to make your life a lot easier when precision is not mandatory.
- Persistence and durability. Do you want to keep the data in case of adding/removing or replacing a node or in case of an application restart or power failure? In most cases, the answer is “of course I do! what are you, crazy?!”, but there are some use cases (such as caching or periodic recomputing) where wiping out the whole data store in case of node failure/cluster failure or maintenance work is acceptable. Imagine a memcached cluster is used to query database query results for up to 1 minute (i.e. to prevent congestion on the underlying database) – in this case, wiping out the cache, starting from scratch and then refilling it (known as cache warming) is acceptable, as it would only entail a small performance degradation during the 10 minutes (this negative effect can be further reduced by performing this cache wiping during maintenance hours, i.e. during the night).
- High availability (fault tolerance or partition tolerance). In some cases, it is important that a data store is never, never down (well, almost). Nobody likes downtime, but in some cases it’s more acceptable than in others (the way you can asses this is by looking at the business impact per hour: revenue loss and legal risk – you know, like people not paying or suing to ask for their money back plus damages). Assuming you are under strict (maybe even legal) high availability requirements, you want to make sure that you data store can take a hit or to; that is, I can go on functioning even if a few nodes go down. This is a way, you can reduce the probability of data store failing if a node fails and you can make sure that the service(s) it provides or supports do no suffer interruptions while you repair or replace the damaged node. So if your truly need to offer such guarantee, make sure you go for a data store which is fault tolerant.
- Note: As an exercise, try to compute the average failure rate of a cluster composed of three fully redundant nodes (they all store the same data), assuming each individual node has a failure rate of 1% per year (in the first year of operations) and that failures are isolated (i.e. not accounting for failures that affect all nodes simultaneously, like power outages or a meteor hitting your data center)
- Backups and disaster recovery. Sooooo, you remember I mentioned a meteor hitting your data center? Yeah, it just hit your data center. Head on, full on. There’s nothing left. Every bit wiped out of existence in 2.47 seconds. Do you want to be prepared for this scenario? If so, add backup and disaster recovery to your data store’s requirement. Remember that for a data store to be considered disaster recoverable, it needs to have an exact/almost exact replica in a geographically separate data center (different continent, different country, different city). Furthermore, you may even require that the replica is hot-swappable (passive backup) or load balance (active backup) with the master version, so that in case of disaster the downtime is non-existent or minimal.
There are other non-technical constraints which you need to have in mind, as some of them might prove to be show-stoppers for some of the candidate data stores you will consider:
- Infrastructure preference: on-premise / private cloud, public cloud / SaaS (software-as-a-service).
- Capital expenses (up-front investment).
- Operational expenses (recurring costs).
- Deployment complexity.
- Operating/maintenance complexity.
- Team’s knowledge/willingness and opportunity to expand that knowledge.
- Licensing concerns.
Pick those requirements which apply for your project/scenario and write them down as the header of a table.
That table will become the compliance matrix for your candidate solutions, which we will use and evaluate in the next article.
When proposing a data store solution, just going with the flow (or hype, for that matter) is not a very safe approach. As a solutions architect, one needs to make sure one has a clear overview of the usage scenarios and business needs served by the data store, research and inventory potential candidates, benchmark the fully compliant ones, examine the results and then make an objective, argument-bound proposal.
Just saying “we’ll implement a cutting edge NoSQL data store” might earn you extra points in front of stakeholders at first, but it is clearly not enough to deliver a mature, robust solution, which is fit for use and fit for purpose and which the development and operations teams can feel in control of.
Let’s start with the diagram below.
- Analyze usage and load scenarios and general requirements for the data store. At this stage you list the features and capabilities you wish to have from your data store, such as scalability (partition tolerance), responsiveness (fast queries) or indexing. You should NOT put items on this list just because “it’s good to have it there”, “all the cool kids have it”, “I heard it’s important” or “I read about it in a magazine”. You should ONLY put items on this list because they serve a business purpose and a real usage scenario. You can mark each requirement as “mandatory” (must have) or “optional” (nice-to-have). Also, it’s a good idea to mark the business impact of not having such requirement – as this will help you discuss with business resources and stakeholders with a lot more ease. Finally, make sure you validate your assumptions with your team and with the beneficiary (client, stakeholders). All these requirements will serve as the header of the compliance matrix for candidate data stores. The compliance matrix can contain technical requirements or business requirements (“the data store should be open source” OR “the data store is commercially supported” OR “the data store is offered as a cloud service on infrastructure provider Amazon Web Services/Microsoft Azure/Google Cloud).
- Rank candidate data stores on the compliance matrix. Do your research, make a list of all data stores you would like to consider. Typically, this list should contain 5 candidates; having more than 7 would mean your losing focus; having less than 3 would mean your jumping to conclusions too soon. Go in-depth for each one of the candidates and see if they are compliant with each of the requirements. Mark “I don’t know” where you are not sure or where further research is needed.
- Decision gate: is any data store which is fully compliant all of your requirements? This step is critical in your process: either you have found one (but preferably, 2-3) candidates for which “the shoe seems to fit” or you need to go back to the drawing board. Typically, when you have too many requirements, it typically means you want to use the same solution to solve several problems at once (there are few data stores which are both transactional and analytical at the same time; SAP HANA would be an example and it is not cheap). So what you can do is to split your problem in two smaller ones (divide et impera in solutions architecture is known as separation of concerns), which can be independently be solved more efficiently. Only do this split is absolutely necessary. Remember: the more pieces you split in, the more integration work you’ll have to handle.
- Benchmark and execute proof-of-concept. Yeah, it looks great on paper, the open source community thinks it’s great, the vendor says it’s great (especially that hot chick who is our account manager). So let’s test it. Pick a scenario. Let’s say 10K transactions per second. Match the number of columns in each record and the approximate type with what you imagine you’ll have in production. It doesn’t have to be an exact replica of the real scenarios – when you’re not sure what you will need, round up the requirement and benchmark something more aggresive. “We might need between 30 and 40 million records per table” translates to “Let’s benchmark it with 100 million”. When executing a benchmark, understand that it not important to match the functionality of the feature; rather, it is important to match (and outmatch) its aggressiveness in terms of performance. Make 50 concurrent requests from different machines altering the same record. Drop in 100 million records sequentially, read them randomly, delete some of them, and then read randomly again – is there any degradation in performance. Shut down n/2-1 nodes during a load test – is the data store still holding? And if so, with what kind of performance degradation? If you turn back on one of the dead nodes, does it start to take in some of the load? Does it reprovision with the data it lost? And so on… Use your imagination when you benchmark, spiced with a pinch of sadism. The main purpose of the benchmark is to confirm that the non-functional requirements of the scenarios you identified are met.
- Rank candidate data stores based on the results of the benchmark. This is the first real world validation of your proposals. This will help you discern from those potential candidates which say they’re good from those who are actually good. This will also weed out any invalid assumptions you have made.
- Evaluate operational cost based on benchmark. Bombarding a data store with requests will make you aware of just how much hardware and resource you will need for the real life production scenario. Based on the results, you can make a more educated guess about costs. Be sure to put it in writing in your proposal.
- Consult with business resources on cost, benefit, risk. So far, this has been pretty much a technology exercise. Now it’s time to share your findings, inform stakeholders of any potential risk and tell them what kind of invoice they can expect for this, including capital expenses (hardware, licenses) and operational expenses (using cloud PaaS or SaaS, renting infrastructure).
- Split scenarios on types so as to allow the usage of two or more integrated, purpose-specific data stores. Let’s say you want to build a data store which processes transactions with millisecond delay (OLTP), but is also able to produce complex reports on hundreds of millions of transactions in a few seconds (OLAP). While there are few data stores able to do both at once (at they are probably over your budget anyway), what you can do is propose and OLTP solution which periodically (every hour, let’s say) batch-provisions data into the OLAP solution. This way, you can have the best of both worlds, if you are able to accept some delay between them (i.e. the reports will not contain the last hour of data, they will not be real time).
It might seem like a bit of overkill, but following this process will make sure you don’t end up with loose ends and with things you discover 2 weeks before or 3 months after the go-live of the final project.
In the following articles I will publish I will drill-down into the first (defining requirements) and the second step (ranking candidate data stores).
Before deciding to transition the apps that your company builds to Big Data / NoSQL solutions , there are a few things one needs to understand beforehand:
- The CAP theorem, which states that a distributed system cannot be strictly consistent, highly available and fault tolerant at the same time. Figure out what you need first. Tip: you might need several separate data stores for different purposes.
- “NoSQL” is just a marketing buzzword, it is not a concrete solution. There are several types of non-relational and of scalable data stores which are labeled NoSQL, although they are very different in capability and performance.
- There is no silver bullet. “One [data store] to rule them all” is something that only Lord of the Rings fan would believe; and even them (most of them, anyway) know it’s fiction.
- There is no free lunch (or “there ain’t no such thing as a free lunch“) which means that a data store will perform wonderfully under the conditions for which is was engineered and it would be a disaster for other scenarios. It’s your responsibility to pick the right tool for the job.
- Don’t do it just because it’s cool. Technology must serve a practical, objective-bound, business purpose. “Our company has to transition to Big Data (because everybody else is doing it)” does NOT constitute a valid reason.
- Is your data really that BIG? Rule of thumb: if you don’t have at least 1TB of data, you don’t need really big data. We all like to think that our department deploys and manages big data, we all like to think that our company need big data. You want to be one of the cool kids who are riding high on the big data new wave. But give serious thought whether you actually are. Before you jump in the Big Data pool, you might want to check out current and future data storage needs (are they really growing that fast?), ways to improve the performance of your current MySQL solution (Google the following: “master-slave replication”, “query result caching”, “memcached query caching”, “database partitions” and “sharding” – see if any ideas light up). Also, you might want to consider a hardware upgrade (servers with SSD drives can do magic, I’m told).
- Performance, capability and low cost: pick two. You can’t have all (see “There is not free lunch” above). Maybe you are a small organization which is not that data intensive. Maybe you need all the query flexibility of SQL and don’t have a huge budget to get into data warehouse BI solution. Understand your business needs, priorities and budget before you start blurting out words like “NoSQL”, “big data”, “lambda architecture”, “unlimited scalability” and “data driven business”.
- Training and support. Fine, let’s say you build the goddamn thing. It works. Passes all the tests. Goes live. The business cheers, the tech guys cheer, everyone’s happy. The OPS/DEVOPS/infrastructure guys: maybe not so much. You see, knowledge on MySQL and Tomcat is ubiquitous, so if you ran into a production problem either the team has the experience or Google and StackOverflow have a lot of things that can help. However, you won’t find a lot of 10 step tutorials on how to recover from multiple Hadoop (HDFS) node failure that occurs during a HBase compaction. For that, you need to make sure your team is either well trained (unlikely if you’re just adopting this tech stacks in the company) or that you at least you have a satisfactory level of support (with SLAs, not just best effort) from your software vendor, from your service provider or from a third party (that specializes in support for open source)
- Not paying up-front ends up being more expensive over time. Every business guy is super-excited that all this big data magic is free, right? Cause it’s open source, right? I’m not going to get into the “free speech vs. free beer” argument. I’m just saying that if you factor in loss of revenue due to downtime, maintenance, operation and support costs – an open source solution might end up being a lot more expensive than paying for licensing, training and support. Whoever says that using open source is cheaper with too much ease clearly doesn’t understand the concept of TCO (Total Cost of Ownership) Make sure your team either has the knowledge and the practical experience of managing the solution you adopt or that you have a solution or support vendor which has SLAs which are acceptable.
- Do your homework, stay in control, don’t buy the bullshit. Big Data is not a solution to all your problems. It won’t make your business bloom overnight. And it’s a lot of knowledge to take in for the technical. “Transitioning to big data is a key objective for our company. That’s why we hired this big data consult” congrats, you just hired a guy who doesn’t know your apps, your business processes or your team and who is probably charging you 400-2000$/day for Googling “how to install HBase on my laptop” – great investment, much successes.
Big Data, scalable data stores and cloud infrastructure – are no longer an “if” for IT, it’s just a matter of “when?“. All I’m saying is that maybe for your business the answer might be “not this year”. And I’m also saying that if the answer is “right now!”, you should make sure you cover all the angles exposed above.
On a less serious note, you can always check out NoSQLBane for some consistency and fault tolerance humor. And for a mix of distributed computing insight and stand up comedy, do watch James Micken speech on big data, NoSQL, cloud, virtual infrastructure and bullshit.
Traditionally, management theory has been based on hierarchy, structure and delegation of activities. However, in today’s business landscape, with is ever increasingly marked by flow of information and changing processes, there is an increasing gap between power of knowledge and power of decision. In other words, the organizational (hierarchical) distance between the point where the relevant information is needed and the point where such information is used to make a decision is large enough for relevant information to be lost.
Of course, old-school managers will tell you that as long a reporting lines are defined and KPIs/objectives are cascaded correctly, there is no problem in efficiently delegating. That would be true, except in a business world that increasingly revolves around technology and services, “information” isn’t only about predefined metrics that are pivoted and rolled up in a spreadsheets and reports. The information has become the change that happens to those spreadsheets and reports.
Ultimately, the gap between knowledge and decision affects an organization to the extent to which changes to the business processes become business as usual (i.e. a regular activity, that occurs more than once during a financial lifecycle).
Let us take an example
In a classical business world, the reporting format down-towards-up (sold units, best selling items, items with the best margin) and the decision format from up-towards-down (targets, commissioning scheme) is pretty straight forward. But let’s imagine the following:
- Christine decides to partner up with a local partner/affiliate, which directly impacts the commissioning scheme and sales volume
- Blaine would also like to leverage online lead generation to boost sales, which impacts cost and customer visibility
- Claire thinks she could improve sales volume by engaging in a profit-sharing scheme by partnering up with a local services provider
Considering all of that, Jim has to prioritize the strategy for next year. To him, all projects seems like a good idea, because ultimately all of them boost sales and customer visibility. And to Jim, it’s all about the bottom line.
Furthermore, let’s assume there is enough time and budget to do both projects in every particular region. And even if Blaine’s idea (online lead generation) could be easily implemented across the other two regions, the affiliate and profit sharing schemes Christine and Claire have in the pipeline are pretty particular to their respective regions.
What none of them imagine is that implementing these projects (with end dates at several times throughout the year) will impact the reporting structure. You can’t directly compare in-house sales with affiliate sales. And online lead generation also involves additional cost, not just additional sales. All of these non-uniform changes in the reporting structure will also change the way budgeting is done for the following year and Jim has to take all that into account.
Moreover, Carla, Alma and Blythe – the persons in charge of the three respective projects in each region have a deeper understanding of the details of each project, but they are not aware of each other project and so cannot foresee the impact they will have on each other.
You cannot drive a car by just looking at speed and fuel gauge
However obvious that might be for cars, a lot of companies are governed by just looking at profit and capital / operational expenses. Even though budgets and project priorities are blurted out in endless Excel sheets, few companies have a truly processes-centric approach that allows them to see how different processes and lines of business influence each other.
The truth is that a lot of people in management positions think, speak and act in lists. But the truth of the matter is that in a dynamic business landscape where automation is business as usual, lists just don’t cut it anymore. Relationships (between projects, processes, features and requirements) are graphs and mappings between changing entities are hash tables.
One thing that cannot be automated is the process of changing processes
A lot of the work that can be automated (rolling up sales reports, balancing accounts and taking orders) has been or will soon be automated. There is less and less room for workers who execute a simple process, day in and day out. Which means that the workload itself tends to become increasingly unstructured, highly variable and less predictable. One of the things that cannot be automated (at least for now, if you believe Searle’s Chinese Room argument) is the process of changing other processes in order to achieve certain objectives.
This means that the knowledge work is less about punching in numbers while being on the phone and more about exploring implications, ramifications and impact of change to the work that is already being done (on most part by machines). However, the workforce is for the most part unprepared for this mindset and so is management.
For instance, the worker may not be willing or prepared to propose a change to a process that might (on the short term) negatively impacting his/her KPIs, objectives or personal revenue. The manager or the executive on the other hand might have a zero-risk policy
Global and local optimization
Let’s say you have a company with three departments: sales, tech and operations. The hierarchical structure present in most companies encourages the three respective managers/VPs of sales, tech and operations to seek the optimum for their silo/department. The fact of the matter is that seeking local optima (what’s good for my department) may often yield a strategy that is deeply sub-optimal for the organization. Of course, it would be the job of the CEO to balance the view and build a global optimum from the local optima, but the reality is that s/he often lacks the information required: partly because it was filtered out at lower level as “not relevant to our department”, partly because he doesn’t have the patience to challenge things on a lower level. By taking the safe path towards local objectives, global objectives can be missed on a higher level.
In programming, choosing the solution that seems to best fit locally and/or on the short term is called a Greedy Algorithm. Although it might work for simple problems, which model linear relationships, under certainty and following simple restrictions, it may often produce deeply sub-optimal results. You see, this class of algorithms are not called “greedy” by chance – they are called so because they seek immediate maximization of the outcome/benefit/revenue. Which brings me to my next point …
Global optima take time to achieve. Which is just the opposite of the current business landscape which seeks immediate gratification. Bigger stock price, bigger sales, bigger bonus. When? By the end of the financial year! Heck, let’s have it this quarter – as a stretch target. We are all greedy. We want pay-offs now. The promotion, the raise, the stock price increase. We put pressure on ourselves, our on peers, on our direct reports. We put pressure to achieve things now. And we keep ignoring the complexity, the impact and ultimately the fact that achieving the best possible outcome every week of the year is not the same as achieving the best possible outcome this year.
We tell ourselves that achieving the best in each department every month will make the company achieve the best this year or for the next three years. And that might have been true when labor was manual and the market and processes and the opportunities changed infrequently. But that is no longer true.
As our world is getting more complex, uncertain, riddled with change and illusive local minima, we are becoming increasingly like the kids in the Standford Marshmallow Experiment: surrounded by temptation to get our “fix” now and depleted of the discipline to seek long term goals.
People who have the information don’t get to decide; people who decide don’t always have the relevant information
You’ll probably think that if relevant objectives are cascaded from executive to management to worker, nothing can go wrong. Right?
Well, that used to be right. But nowadays, the complexity and inter-dependency of processes (especially automated ones) put the knowledge worker in the position to be the only one to spot or define what is “relevant” in some cases. As you’d expect, “cascading” this information upwards goes against the flow and oftentimes gets a lot of resistance – especially if you have to get through 7 layers of red tape until you can get to someone who has the authority to make a change.
Even if the hands-on guy at the bottom of the food chain who spotted a problem in the process or an opportunity for improvement somehow manages to get his point across to his manager’s manger’s manager, this will have taken 3 months. It will take another 6 months of meetings with people who have no knowledge or competency in the matter to get the project pushed through, budgeted, approved and scheduled. Most of this red tape will not improve the original idea, but it will riddle it with compromise. The guy on top won’t be willing to vouch for the idea (even if it’s a good one) out of fear of alienating his other direct reports.
In a shifting business context, the core idea of relevance (the “key” in Key Performance Indicator) is one that requires effort and input from throughout the organization. And most organizations are still severely top-down.
Flatten or shard: why hierarchy is dead
We used machines to speed up our processes, to scale them to high volumes of decisions and events, to make them more reliable and cheaper. We did this to such an extent that the bottleneck in organizations has become people’s ability to understand, plan and follow-up on change. Part of that is because our educational system still embeds our minds with the “assembly line” mentality; the other part is that both workers and managers prefer short-term (and short-sighted) gains and a risk-averse attitude.
The modern workplace needs to extend its mentality toolbox and means of interaction beyond list and tabels (towards charts, graphs, analytics and more scientifically founded decisions) to deal with increasing uncertainty and complexity.
Some ideas of improvement may include:
- Removing unnecessary overhead and flattening organizational structures.
- Rotating people between similar positions before promoting them.
- Creating cross-functional knowledge roles rather than cross-functional management roles
- Make sure managers have hands-on experience
There are two main trade-offs between flattening hierarchy (reducing subordination) and sharding business lines:
- Flattening reduces overhead, but it may also blur accountability
- Sharding clarifies boundaries, but reduces opportunities for cooperation and creativity
Ultimately, organizations have a choice between reduced risk and increased cooperation/innovation/creativity. And in today’s landscape, there is less and less of a clear recipe.
Instead of a conclusion
Organization face a great challenge of transitioning from traditional hierarchical/command-and-control setups to flatter or matrix-like structures. In this transition, confusion is the highest risk. So ultimately, the best tool for keeping things under control is keeping organizational process knowledge closer to the point of decision, not only towards the place of execution.
Delegation works great for activities, but it fails miserably for knowledge tasks.
This summer I decided it was high time for slimming down, mostly because I got tired from the simplest things – like going up the stairs for 4 floors. So in the way worthy of a project manager with engineering background, I set an objective, I made a plan and I started tracking the metrics. After all …
You cannot manage what you do not measure.
Start weight: 108 kg
Target weight: 93 kg
Delta: 15 kg
Budgeted time: 4 months
Targeted loss/month: 4 kg /month
Below, you can find the charts. I did not use any real time apps, or gadgets or wearables. I’m old fashioned like that: analog scale and Google Spreadsheets.
Above: Real measured weight is painted in blue, while the (linearly) planned target weight in painted in red.
Above: the “ahead-of-plan” metric (also called “buffer”), as measured as planned weight minus real measured weight.
Note that the points in the chart are not equally-spaced.
Above: the average daily loss. Note that the points in the chart are not equally-spaced.
My conclusion from this 3 month+ experience:
- Measuring relatively often keeps you focused, as in allows you to reinforce a concrete, practical small target every few days (or once a week), rather than a big monthly target.
- No matter how disciplined one is, weight loss does not occur at a constant pace. Some weeks you exceed you target, some weeks you miss it. See the average daily variation chart.
- Weekly targets don’t matter that much on the long term, but they can motivate and drive your actions and choices (i.e. salad instead of pizza, orange squeeze instead of Cola) on the short term. Missing a target every once in a while is good if and only if it motivates you.
- Don’t obsess over daily micro-measurement. Some days you are better hydrated before you measure yourself and some days you are less so. Therefore, it can seem you suddenly gained 1 kg, when there is no actual change. Always assume there is an inherent daily “noise” in your measurement which evens out on the long term (weeks, months). To even out the noise, try to do all measurement at the same time of day, using the same scale. Note: I have not kept daily measurement, so that noise is not visible on my charts.
All in all, now I feel much better. And I’m very proud of my analytics.