
Creating reality – Explorations of a technical and societal nature in Generative Machine Learning Research
This blog is AICoreSpot delves into the exciting world of developing new realities. Perhaps at its core, lies the question of what we exactly mean when we use the term generative machine learning. Make a note of the word generative. It can refer to a plethora of things, but this blog post primarily leverages it in two ways, there is technical variant. Additionally, there exists a sociotechnical variant.
Part I: Algorithm, Inference, and Model
To dive deeper into this subject of generative machine learning, it will be especially useful to posses a structural perspective of the subject: to comprehend the technical structures and the aspects that coalesce into what we refer to today as generative machine learning. It is worth noting that not everything within machine learning can refer to itself as generative, as we, as a community impart meanings and interpretation onto any definition, accept a few and reject the others. This conjunction of technical structures and interpretive communities that we leverage to comprehend our domain – it’s the pulse of machine learning.
To be generativmplies to: imagine, simulate, sample, develop, confabulate, emulate, synthesise, or fake, to produce aspects of the world we represent as information. We can ask to produce data of the variants we may have observed or measured, or produce realities that may not been but might reasonably happen. So provided this description, generative machine learning is an extension, studied leveraging the language of probability.
At its foundation, generative learning is reliant on the leveraging of systems of constant and plausible reasoning, statistically gathering data and evidence, and leveraging it as the natural generalisation of logic for a world that is vague. Leveraging the language of probability, we can proceed to unveil generative machine learning within a system that is a triple of models, inference, and algorithms. In this perspective, what we do in machine learning is best conceived of as made up of two critical portions: a model that is descriptive of the world and information we are researching, and a group of inferential strategies that enable probabilistic and plausible reasoning in that specific model. As we possess several models and inferential strategies, any particular choice of model and inference are then brought together in a particular algorithm.
Let’s make this concrete to contemplate about the famous variational autoencoder or VAE. It is typical to hear and read a wrong expression of a “VAE model”. What’s erroneous in context here, is the utilization of the word model. The VAE is essentially an algorithm with particular choices for optimisation and efficiency that comes up through the combo of a latent variable model, and leveraging an inferential process on the basis of variational inference.
Models are the preliminary phase of this triplet of model-inference-algorithm.
What types of statistical models are available to leverage with regards to generative learning? One useful distinction is between prescribed and implied generative models.
- Prescribed models are essentially models that leverage likelihood functionalities and intend to be directly representative of the likelihood of events in the planet. A majority of the models we encounter in books about machine learning and stats are of this variant, regardless of whether they are the foundational generative models such as PCA and factor analysis, or state space models such as the ones leveraged in HMMs or the Kalmann filter, or the latest deep generative models.
- Implicit generative models are data simulators. They provide a description of probability implicitly but not directly. Several models are of this variant, which includes simulators of physical systems, samplers of phylogenetic trees, neural samplers, and graphics engines.
The variant of model we are interested in is critical in setting down what can and can’t be accomplished with these models. Critically, the choice of one variant of model over the other makes various types of inferential methods possible.
It’s just as useful to break down approaches towards inference into two distinct variants, thinking of direct and indirect inferential methods. For clarity’s sake, inference is not the forward assessment of a model that has received training. Within probabilistic models, inference is the computing and assessment of unknown probability distributions.
- Direct inferential strategies convert all computing to the activity of predicting or just being aware of the data probability. p(x). This could possible be a bound or approximation of the data probability. All the conventional strategies like maximum likelihood or variational inference or Markov Chain Monte Carlo are direct inferences.
- Indirect inferences rather take on the perspective that directly targeting the information probability is just too difficult. Rather we can learning through the tracking how the data probability alters by viewing how it behaves in relation to something else. Other strategies like approximation Bayesian computation or the strategy of moments, or adversarial and density ration strategies are classified into this category.
The issues of inference are the most fascinating when you go about exploring machine learning. It is crucial to review two subjects in probabilistic inference that come up through our research of generative models, when you attempt to make descriptions of them, they all in some way become more fundamental questions of how to go about manipulating probability distributions.
The first of these subjects is how to make a representation of a distribution. Distributions can have representations analytically if they are simplistic enough, or via an assortment of samples, or via a sampling program. For the types of complicated data we encounter presently, being reliant on simple analytic forms is not too robust of a methodology. And being reliant on samples alone puts a potentially heavy memory and computing load over being just merely clunky. Normalising flow attempts to fill the middle ground, rather beginning with a known and simplistic distribution and then altering it leveraging repetitive operations to make representation of more complicated distributions, being reliant on the rules for change with regards to variables in probability. If we can perform this, then any distribution can be made representations of as long as we are aware how to sample the base distribution and be aware that the program alters those samples. This procedure facilitates both simple sampling and entropy assessment. Presently, there are complicated normalising workflows for nearly any kind of data we can conceive of. However, we have not come close to exhausting the critical questions of how to make representations of distributions: there remain critical questions of effective computation, managing high-dimensionality, discrete data representations, new flows for particular applied issues, the critical role of probabilistic programming, and theory in relation to performance levels and efficiency.
The second subject investigates how to make representations of the gradient of a distribution. This issue appears in both variational inference and adversarial learning, but also happens to be an age old issue in statistics. This issue, referred to as sensitivity analysis, questions how we go about computing the gradient of an expectation of a function, when we wish the gradient to be with respect to the parameters of the specific measure. We are not aware of this integral in closed form, it might be high-dimensional and therefore tough to compute, and we might only be aware of incomplete data with regards to this system. Also, initially computing the integral and subsequently taking up the gradient appears wasteful in some manner. Just viewing this issue as an equation, there are few overt objects of criticality, the measure, the cost, and the gradient. Therefore, in order to go about computing the gradient, there appears to be only two things that we can do, we can either undertake manipulation of the measure, or manipulation of the cost.
Through manipulation of these probabilities as we did prior, we’ll be capable of coming up with at least three methods of performing this, giving rise to three groups of estimators for the gradient, referred to as the score function estimator, the pathwise derivative estimator, and the weak-derivative estimator. This is an apparently anodyne issue, but props up everyplace you look, in logistics and queues, in algorithmic gaming frameworks, and is with as we get into new eras of scientific progression. And again, a lot more research remains in expansion of this thinking: to new ongoing time-systems, combining these estimation strategies, producing more in-depth theories of estimator variance, and comprehending how options of smoothness and other attributes of our model impact performance.
It is also noteworthy to describe the role of memory systems and amortised inference in sharing statistical strength within inference strategies, and also the forgotten role of hypothesis testing as a base of inference through contrast. However, we’ll save these subjects for another day. And like in any sphere, the role of assessment stays crucial and where we must be continuing to provide depth to our study, and for which we are still lacking the wide set of tools that are increasingly required.
Now, if generative modelling were to stay essentially a domain of methodology and theory then this would be a bad outcome. Thankfully, present generative modelling is not essentially statistical domain, it also serves as an engineering field. And we don’t need to go into too much detail about all the several spheres in which generative models have identified applications, which ranged from electronic technologies for large-scale image identification, natural language processing, and image-to-text and text-to-image conversion, in addition to scientific domains ranging from protein comprehension to galactic exploration. And in a convenient process of reinforcing knowledge, our engineering attempts cause new statistical questions, and these statistical queries which in turn enable us to develop more robust frameworks, and so the wheels turn.
One field that has demonstrated significant commitment, is the part of machine learning to assist our urgent responses to environmental modification as an outcome of the global climate crisis. One utility of escalating requirement, is the capability to make forecasts of environmental variables. The issue of nowcasting, in which we make short term forecasts of environmental variables such as rain, is one sphere where we’ve demonstrated the obvious advantage of leveraging generative models. Generative strategies can generate considerably more precise forecasts in comparison to the present state of the art, and give genuine decision-making value to specialists.
In a more general sense, generative strategies are natural recommendation for constructing digital twins, digital simulations of the physical infrastructure like buildings, roadways, train systems and energy networks. Investing in digital twins will appreciate dramatically over the course of the next decade as one portion of our climate reaction and net-zero attainment strategy. The strategies are very much portion of continuous policy discourse for which, we as technical designers can also contribute.
As we come to an end on this first portion, the emphasis should be placed on the foundational philosophy that lies at the heart of generative machine learning. The principle, always at work, is that there is a basis data producing process that we are attempting to discover and mimic. Our activity, leveraging the language of probability, is to capture the actual reality that produces the experiences and avenues and actions that we document as data and leverage to construct our models. To increase our models utility, we also acknowledge that we streamline and abstract away many portions of that real foundational or true generative process. An ongoing commitment to generative thinking s then to ask ever more in-depth questions with regards to data generating processes.
This variant of deep generative commitment, organically encourages us to bring up queries that go beyond the immediate technical scope of issues of the variants we discussed. This commitment enables us to view that our information comes not just from technical domains and questions and processes, but additionally from a wider social world. We comprehend that the social world is an inherent and integral portion of any generative processes, and some percentage of our attention must go towards comprehending the wider reality. Let’s move on the next portion.
Part II: Critical Practice
As we identify that generative processes are obtained from and integrated within a social reality, they should be comprehended not as essentially technical systems, but rather as sociotechnical systems: algorithms and models and processes are impacted by the social sphere, and in turn themselves impact that social world. The expression is often leveraged to describe this is to state that society and technology are mutually constitutive. Obviously, we don’t believe this new to anybody here, as scientists in NLP and language systems have always comprehended the position of language and their operation in how its related to culture and society.
Several applications of generative models come under the more general headers of synthetic media, stating all from the apparently benign to the dystopian. We can all think of instances of face swap apps or art generators or speech and language producers in this classification. As a domain, we believe, we know accepts that we cannot persist to motivate for the advantages of our research, without also putting forth that our work can be the cause of be implicated in harm.
To speak about advantages without referring to risks us to be naïve in our comprehension of the place of our research. To speak about risks without contending if whether those risks are shared in a uniform manner is to make the same mistake again.
The risks that can be recorded as an outcome of AI applications all over the world, regardless of whether in facial recognition, predictive policing, misinformation, resource distribution, movements in labour practice, or medical diagnostics – did not prop up by chance. They are the outcome of long-term systematic mistreatment and inadequate legislative and economic safeguards. Any wish to use our research to assist a more prosperous society will have to face this legacy, and specifically with at least three unique forms of algorithmic harm: algorithmic oppression, algorithmic exploitation, and algorithmic dispossession.
Algorithmic oppression details the unfair privileging of one societal group at the cost of others, which had maintenance via automated, data-based, and forecasting systems. Ranging from face recognition to predictive policing, such frameworks are typically based on unrepresentative datasets and are reflective of historical social injustices in the information leveraged to develop them. In the midst of the COVID- 9 situation, unrepresentative datsets have resulted in bias in resource allotment, and forecasting models further amplified health inequalities already disproportionate to underserved population. A majority of the prevalent discussion with regards to algorithmic bias is based on this first classification of harm.
Algorithmic dispossession is, at its foundation, the centralizing of power, resources, and rights in the control of a minority. Within the algorithmic scope, this can demonstrate itself in technologies that “manage” or prevent specific forms of expression, communication, and identity (like content moderation that notify queer slang as toxic.) or via institutions that give form to regulatory policy. Similar dispossession dynamics are prevalent in global climate policy, which has been mostly given form to by the environmental agendas of the Global North, the prime beneficiary of centuries of climate-modifying economic practices. The same patters also holds relevance for artificial intelligence ethics guidance, regardless of our technology’s international reach.
These three hazards are portion of the bigger exercise in memory and analysis of the social data production process. What are the prevalent attitudes, and knowledge bases and strategies to executing our research that led to the contribution of such severe hazards to our world? It’s arguable that these dispositions are the base and remnants of an ancient way of life and thought inherited by our from our shared cultural experience – this is colonialism. Colonialism was among the final and biggest missions undertaken with the intention to go “good” – presumably to confer civilization, modernity, and democracy, and technology to those that did not have access to it.
Colonialsm was once present in every portion of our earth. Colonialism’s influence persists today, with it’s influence over us apparent:
- Physically, through the ways in which our borders are shaped
- Mentally, in how we perceive ourselves and each other
- Linguistically in the role of English today as the language of Science and exchange,
- Through racism and racialization that was devised in the colonial era to establish orders of hierarchy, orders of dividing people up
- Economically in how labour is obtained in one location and profit produced elsewhere
- And politically in the structures of governance, legislation, and international relations that still are classed as colonialism’s fault lines.
We make references to the remainders of colonialism and influence on knowledge and comprehension currently while leveraging the term coloniality/ So portion of the exercise is to content with the coloniality of NLP. To enlist all the manners in which we have reproduced and inherited, via technology a colonial disposition to the world.
Let’s move a bit further away to tackle one final, but connected subject. There was a communication on Twitter by Dr. Sindhi. The scene is a health checkup with a 65 year-old patient that goes like this. She asks:
Me: “Mama, are you in a relationship?”
Mama: “no, my husband died 15 years ago”
Me. “do you have a “friend””?
Mama: “yes I do have a friend.”
Me: “How often do you “visit” your friend?
Mama: mostly month end
Me” Do you use condoms? Yes? No? Sometimes
Mama: Sometimes
There is a lot to digest in this scene. But the takeaway here is, the role that silence has to play in language. One needs to be aware that elderly women in South Africa don’t have boyfriends or sides, they have friends. Being reliant on this context, in the silence and in-betweens, this conversation can discuss the sexual activities of a patient, without making any direct mentions to it.
We are biased towards speech. We are biased against silence. A bias in language can be over-listening. This is a subtle, persistent, and overpowering type of benevolence and over confidence in our technical work. However, what technical system at present, even if it received training on all the language data available, would be able to actually utilize the cultural differences and context to do what we observed in that small conversation.
The decolonisation of language is a subject with a rich history. A lot has been done, and a lot is pending. So how do we develop a new field of Decolonial NLP?
Part III: Generative Practice
Part II of our discourse led us into the technical sphere of critique. We went over a lot of concepts and theories: technological hazards, exploiting of algorithms, oppression and dispossession, coloniality and colonialism, and the cultural context and difference between silence and listening. Let’s make the shift from taking a critical stance to an effort at being generative: an attempt to identify new ways of tackling the concerns and issues we looked into. This is another way of producing reality.
Our launching pad is to look at the foundational basis on which we approach our research is developed. Several of our implicit biases are unqueried beliefs with regards to knowledge unveil themselves in the attitudes and perspectives we at times take when carrying out research and deployment. Let’s observe 5 of these attitudes:
- Knowledge transferring. Inherently, the scientific projects we collaborate on explicitly or implicitly acknowledge that knowledge and expertise is inequal in the world. To have research provide advantages to human beings, portion of our research apparently becomes to help the migration of knowledge from centres of power (such as our research laboratories) to places where it is lacking.
- Benevolence. An implicit attitude that comes up is that, where data, know-how, or technology leaves room for something to be desired, technical progression ought to be instated by the ones in the know, or the powerful on the behalf of the others who are to be impacted or altered by it.
- Portability: It can be rather easy to fall into the belief that concepts and strategies applied to any specific place or 1:1 applicable to a disparate scenario or place. We incorrectly believe that knowledge obtained anywhere will always function just as good someplace else, although it is very uncommon for this to be the case.
- The standard of excellence: As a final attitudinal consideration do we make the assumption that the standards and forms and the world contained in our research labs in our nations, so that is – within our scope of knowledge and technical capabilities – are to be models for tomorrow for other places.
These attitudes crop up all throughout the work. As one instance, a majority of present approaches for algorithmic fairness make the assumption that the target traits for fairness – most often, legal gender, and race, can be observed or documented and with this data are fairness evaluations made. However, a majority of traits will remain unobserved, they will be often missing, unknown or are basically unquantifiable. Gender identity and sexual orientation are examples of these unquantifiable traits.
Queer fairness puts forth a variant of silence that is worth looking at. And wading through this silence is a basic aspect for all of us with queer identities. To be aware of this silence, remember a core tenet of queer life, that we don’t out folks – we just don’t do it. Prevailing attitudes of quantification and portability lead us as scientists to reach out for “gather more data” solution to several issues, asking with regards to these traits through more self-id exercises and surveys, it causes us to go against this variant of fundamental element of respect. When these exercises are good, they will, as always, need addition contemplation, analysis, and a keen ear.
Queer life, if we are okay with adding it in, can impart insights into the challenging hurdles we are facing in our respective domains. Queer fairness highlights the requirement for new directions in fairness research that consider a plethora of considerations, from safeguarding privacy, context sensitivity, and process fairness, to the knowledge of sociotechnical impacts and the escalating critical part of inclusive and participatory research processes. Through bringing the expertise and experience of queer individuals and communities into the science, by identifying that there is know-how that we don’t possess, and that not all knowledge must be encoded into the language of science to be worthwhile, we can in collaboration, problematise and unveil these tough questions.
A researcher shares a story of him being part of a collective of individuals to develop a new organization, referred to as the Deep Learning Indaba, whose goal was to fortify machine learning throughout the African continent. Over the passage of time, we have been able to develop new communities, develop leadership, and identify excellence in the development and leveraging of artificial intelligence throughout Africa. The fervent nature with which young African people put forth their ideas, present them in front of the world for the first time, to obtain recognition for their research, and to be aware amongst their peers that their queries and strategies are critical and integrated with the way in which they are giving shape to the continent’s upcoming years.
It is positive to witness other groups formed with similar ideals, in South-East Asia, In Eastern Europe, in South America, and in South Asia, on top of the other inspirational community groups such as the wonderful Maskhane NLP group, Data Science Africa, Queer in AI, and Black in AI: all taking accountability for their communities and developing grassroots movements to assist AI, discourse and transformation. In retrospect, over the previous half-a-decade, we can now seriously state that International AI is now more so due to the committed individuals in the movement and the fervent passion of these groups.
In a crux, we’d say that being generative is difficult. To produce differing realities requires new strategies and contemplation that we might not possess yet. However, for this reason, it will be vital for us to persist in breaking disciplinary restrictions. The limitations that keep us immobile are several: our publication incentives, research cultures, the conundrum of international ambition with uniquely non-international representation, and relevance. Conferencing is one method to provide a definitive shape to research culture. And so too are our choices as a domain. So much research lies in the future for us as a community, but a generative disposition is one utility we all have at our behest.
We’ve come to this juncture compelled by two primary themes.
- To start with, the technical world and the social world have always been inherently intertwined. We are intertwined in it presently, at this current moment, wherever you might be reading this from. Generative machine learning is an overt manifestation of this intertwining.
- Next, to be ahistorical and uncriticising in our strategies in approaching machine learning risks us getting into default perspectives of coloniality, ignoring the silence and listening required as aspect of our new practices.
We thank you for persisting and reading this blog article by AICoreSpot. The future cannot be taken for granted. Nothing is certain or definite. The safe and trustworthy and generative technology of the future, starts with our attempts today. That revolution, we believe, dwells in the mundane, in our daily actions of thoughtfulness and rapture.