
Among my greatly respected global colleagues and friends on LinkedIn, there seem to be persistent, circular debates concerning the nature and viability of the data science profession. Opinions seem to run the gamut from ‘data science is the best thing since sliced bread and will take over the world’ to ‘data science is a weak, transient fad wrapped in odoriferous charlatanism’.
Between these extremes, there are regularly volleys and variants of ‘I am the one and only REAL data scientist (and you are not)’, leading to seemingly devolved playground pushing matches taking the elegant form of ‘no you aren’t!’, ‘yes I am!’, and the inevitable ‘I’m taking my fancy data science toys and going home!’
Clearly better can be done, especially in the interests of the many, many people who hold this job title, and the many, many companies who have invested optimistically and generously in data science as a credible project. The foundations and premises of the domain are actually quite optimistic and avowedly humanistic, namely recognizing that humans in large groups commonly make pretty crappy decisions quite often, and that if we could simply introduce a modicum of rigour, we – companies, governments,society, the planet, etc. – might just be a little better off. In fact, part of the assertion is that we could likely not do any worse than some of the ways we are making impactful decisions at this present historical moment. So, in essence ‘yeah, science!’
Clearly, data science meets superficial criteria for being a profession in that there are many named professionals operating under the job title. The simple truth is that data science is a de facto profession according to the pragmatic criteria that there are seemingly tens of thousands of people globally employed with this job title that are being paid to do… something!
However, many debates concerning consistency and standards of practice persist. These take on the aspect of existential professional dread at times, hand wringing and questioning. Part of this is avowedly positive (motivational self-improvement): we can do better and improve things! Part of this is self-defeating (zero-sum): I ‘am’, and you ‘are not’ – competitive game playing.
In the interests of understanding the resident gaps, with a view to addressing them in the future, a formal assessment approach is suggested. Bear with me, for things are about to get rocky. My suggestion is to defer to a fairly robust and long-standing research tradition in studying professions and professionalization. For ‘real’ data scientists, this may force you out of your comfort zone, as we are largely invoking… gasp!… sociology (yes, I know – calm down, calm down).
Let’s get down to the bitter medicine. Deriving from formal academic theory on professionalization, one more or less tolerated (which is stupendous advocacy in the bitter and contentious realm of social theory) definition for a profession offered by E. Greenwood (1957) specifies the following criteria:
- Embodies a systematic body of theory: pretty self-explanatory, but theory in a bubble is not enough – a body of theory explains and is tied to ‘real world’ dynamics…
- Authority is accepted by clients: people who employ the professionals respect the results these professionals advocate
- Community sanctions authority: there are formal and informal institutional mechanisms for asserting the authority of professionals
- Ethical codes are enforced: there are controls in place for bad behaviour, charlatanism, etc.
- Culture supported by professional societies: professionals come together in shared bodies to come to agreement on important topics and the drink coffee and/or beer
According to this criteria, and compared to a mature profession such as a medical doctor, lawyer, or mechanical engineer, it would seem that data science superficially addresses some of these criteria, but substantial gaps exist.
Let’s take a closer look, with some grades offered to suggest how we are doing – namely:
- Systematic body of theory – 75%: The data science umbrella hosts a highly sophisticated and diverse set of interdisciplinary methods spanning a range of academic fields. However, critiques linger concerning the ability to rigorously validate methods (e.g. correlation not causation, inherent opacity of more complex methods). Difficulties substantiating rigour, whether due to lack of overhead and/or inherent model opacity, especially in commercial domains, create doubts concerning efficacy and ethical integrity. In general, theoretical integrity in the domain is methodologically diverse, yet often practically shaky due to issues surrounding rigour and ad hoc standards for organizational adoption of guidance.
- Authority is accepted by clients – 50%: Data science currently evidences a great deal of titular respect and authority due to popularity. However, this may very well be transient and superficial. Management research attests to the practical difficulties of promulgating methodological data science-derived guidance into organizational decision making. Technical decision guidance can be gamed or perverted by vested interests, with a lack of consensus concerning formal organizational governance principles. The degree to which decision guidance resulting from applied data science practice is incorporated in organizations is anecdotally variable: management can decide to ignore, reframe, or modify results to meet its own needs or interpretations. A substantial critique is that in commercial settings time and resource pressures do not easily facilitate rigorous scientific validation. Detaching data science from scientific rigour undercuts the authority of recommendations, relegating assertions to informed opinions from technocratic experts rather than objective outcomes resulting from rigorous scientific inquiry.
- Community sanctions authority – 50%: Data science, being largely a free-market movement and typically detached from formal strictures, lacks traditional bodies for enforcing ethical and methodological rigour. The exceptions are highly specific to regulated industries, such as pharmaceuticals, financial services, and public services, and in these cases strictures typically pre-date the data science movement. Further, there are a lack of broadly accepted, universally agreed, and broadly socialized professional certification programs. Although a rough consensus exists that STEM-associated advanced academic degrees are well-regarded, there are a lack of concomitant standards regarding the application of the diverse fields composing data science to applied practice, wherefrom formal, rather than collegial, authority is derived (i.e. independent certification bodies maintaining methodological and ethical standards). However, universities worldwide have responded to the great interest in the domain with a raft of new advanced degree offerings as well as certificate programs. However, standards for such degrees and certifications yet lack universal consensus and debates persist concerning the discrete nature and disciplinary focus of the domain.
- Ethical codes are enforced – 25%: Policing of standards are poorly-supported to non-existent. Ethical stewardship seems mainly to be bound to a loose agreement to apply scientific methods. However, given the conflicting pressures for timely results (versus the inherent costs of academic-level validation), such claims are dubious, especially in commercial settings. This is a common critique concerning the invocation of the term ‘science’ as associated with the profession – there are loose and dubious grounds for maintaining scientific rigour in the profession, apart from self-policing. Self-policing is threatened when commercial pressures supersede attempts at rigour. Currently, censure amounts to public finger-wagging and hand wringing.
- Culture supported by professional societies – 50%: Several professional societies have responded to the swelling interest in this domain by accommodating and adopting analytics and data science into their purview. Examples include INFORMS and The OR Society (operations research), IEEE (engineering), and ACM (computing / IT). However, there is room for improvement and expansion.
So, thank you for bearing with me… where are we? Well, not to great – an average grade of 50%. Kind of a ‘fail’, actually?
However, the point here is not to be a ‘debbie downer’, but to be optimistic. The intent is to suggest that there is work to be done, and that much of the work likely resides in more formal approaches to strengthening professional rigour and understanding. A modest suggestion might be better efforts to create standards for training and educational curricula, followed perhaps by a code of ethics and standards maintained by a professional body.
Well, thank you for listening in any case. The point here was to observe that in order to advance the profession, we are not going to get anywhere by quibbling about technical methods (something I see almost every day). The future of the profession rests in finding better mechanisms to formalize the social contracts surrounding data science, and to promulgate these through academic and professional institutions. A modest view – take it as you will!
P.S. There is a much more robust description underway – I’m working on a book!
REFERENCES
Blenko, M. W., Mankins, M. C., & Rogers, P. (2010). The Decision-Driven Organization. Harvard Business Review.
Boyd, D., & Crawford, K. (2014). Critical Questions for Big Data. Information, Communication & Society, 15(5), 17.
Greenwood, E. (1957). Attributes of a Profession. Social Work, 2, 11.
Kiron, D., Prentice, P. K., & Ferguson, R. B. (2014). The Analytics Mandate.
LaValle, S., Lesser, E., Shockley, R., Hopkins, M. S., & Kruschwitz, N. (2011). Big Data, Analytics and the Path From Insights to Value. MIT Sloan Management Review, 52(2), 13.
April 28, 2019
Advocacy, Best practices, Management, Research