Principles First: Integrating Responsible AI into Your Research

Responsible AI is a framework that emphasizes the development of AI technologies in a way that respects ethical principles, societal norms, and individual rights. Here’s a beginner’s guide for AI researchers looking to integrate responsible AI principles into their work.

Understand the Principles of Responsible AI
The first step is to familiarize yourself with the core principles of responsible AI. These typically include fairness, transparency, accountability, privacy, and security. Understanding these principles will help you to consider the broader implications of your work and ensure that your research contributes positively to society.

  • Fairness: AI systems should be free from biases and should not discriminate against individuals or groups.
  • Transparency: The workings of AI systems should be open and understandable to users and stakeholders.
  • Accountability: AI researchers and developers should be accountable for how their AI systems operate.
  • Privacy: AI systems must respect and protect individuals’ privacy.
  • Security: AI systems should be secure against unauthorized access and malicious use.

Engage with Interdisciplinary Research
AI research does not exist in a vacuum; it intersects with numerous fields such as ethics, law, sociology, and psychology. Engaging with interdisciplinary research can provide valuable insights into the social and ethical implications of AI, helping you to design technologies that are not only innovative but also socially responsible. Collaborate with experts from these fields to gain a broader perspective on the impact of your work.

Adopt an Ethical Framework
Developing or adopting an ethical framework for your research can guide your decision-making process and help ensure that your work aligns with responsible AI principles. This could involve conducting ethical reviews of your projects, considering the potential societal impact of your research, and implementing guidelines for ethical AI development.

Prioritize Privacy and Security
Given the increasing amount of personal data being processed by AI systems, prioritizing privacy and security is essential. This means implementing robust data protection measures, ensuring data anonymization where possible, and developing AI systems that are resilient to attacks and unauthorized access.

Foster Transparency and Explainability
Work towards making your AI systems as transparent and explainable as possible. This involves developing techniques that allow others to understand how your AI models make decisions, which can help build trust and facilitate the identification and correction of biases.

Engage with Stakeholders
Engage with a broad range of stakeholders, including those who may be affected by your AI systems, to gather diverse perspectives and understand potential societal impacts. This can help identify unforeseen ethical issues and ensure that your research benefits all sections of society.

Continuous Learning and Adaptation
The field of AI and the societal context in which it operates are constantly evolving. Stay informed about the latest developments in responsible AI, including new ethical guidelines, regulatory changes, and societal expectations. Be prepared to adapt your research practices accordingly.

Conclusion
Integrating responsible AI principles into your research is not just about mitigating risks; it’s about leveraging AI to create a positive impact on society. By prioritizing ethics, engaging with interdisciplinary research, and fostering transparency and stakeholder engagement, you can contribute to the development of AI technologies that are not only advanced but also aligned with the greater good. The journey of becoming a responsible AI researcher is ongoing and requires a commitment to continuous learning and adaptation.

Here are some interesting papers that can help you ponder:

Task-Specific Fine-Tuning of Large Language Models Made Affordable with Prompt2Model

It’s an open secret that we are entering into the magical world of large language models (LLMs). We have already celebrated the first anniversary of ChatGPT a few weeks back. In the meantime, we have been introduced to a wide range of large language models performing several tasks including text generation, code generation, image manipulation, and even video analysis. We can utilize these large language models through prompting. But there are some major concerns to be addressed.

How much access do we have and at what cost? 

Utilizing a large language model through prompting requires either access to commercial APIs or significant computation power. Both of the approaches are expensive. Privacy has also been a major concern while utilizing commercial LLMs.

If these LLMs have such high requirements, how can we customize them with low cost? How can we utilize LLMs for specific tasks?

Considering these issues, Viswanathan et al., have introduced a method for utilizing existing LLMs in an efficient way through natural language prompting. Their work was published by the Association for Computational Linguistics in the Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Their work aims to produce task-specific lightweight models that will outperform LLMs with a few hours of fine-tuning.

The Prompt2Model architecture seeks to automate the core machine learning development pipeline, allowing us to train a small yet accurate model from just a prompt.

We can have a glimpse of the whole process in Figure 1. We can pick suitable models and datasets, generate synthetic data, and even fine-tune existing models on both retrieved datasets and synthetic datasets using natural language prompting. All we have to do is describe the task and give a few examples, the rest of the job will be done. 

Great initiative indeed, but how does it work?

It’s pretty simple but works amazingly. This process goes through multiple steps including data retrieval, data generation, model retrieval, and supervised fine-tuning. The process starts with input preprocessing to identify the instruction and example. Based on the instructions and the example, it lists out the most suitable datasets and models for the task. Later the most suitable model is fine-tuned on selected datasets to produce a ready-to-use model.

How does it select the most suitable dataset and model?

The final decision for dataset selection is kept for the users. It shows the top k (default = 25) most suitable datasets. From these datasets, users can select the most relevant ones for a particular task. Users can also specify appropriate fields for the dataset.

For model selections, it ranks the relevant models. The ranking score is calculated based on the similarity between the user query and model description, and the logarithm of the total number of downloads. Before ranking the models, a primary filtration is conducted to filter out models whose size is higher than the threshold (3GB by default). After calculating the rank score, the top-ranked model is picked.

What if available datasets are not sufficient for fine-tuning the model?

It’s a very common obstacle faced in the field of Deep Learning, especially when we are talking about low-resource languages or a wide range of tasks. To address this issue, authors have added an extra feature to their work. They have added a dataset generator module, where they utilized automated prompt engineering to generate diverse datasets. Later the synthetic dataset can be added to retrieve the dataset for fine-tuning the model.

Everything seems decent. But how does it perform?

I believe we all are aware of the GPT family. Just to recall, this is the group of LLMs that is empowering ChatGPT. Prompt2Model has produced models that have outperformed get-3.5-turbo by an average of 20% while being 700 times smaller. The authors have evaluated the models in three downstream tasks; Question answering, Code generation from natural language, and temporal expression normalization. The performances have been compared with the performance of gpt-3.5-turbo as the baseline. Models produced by Prompt2Model have outperformed get-3.5-turbo in question answering and temporal expression normalization but did not perform well in code generation from natural language. Relatively low diversity in the generated dataset has been considered as the possible reason for this low performance. Another important finding is that the combination of retrieved dataset and generated dataset can achieve similar results to the custom annotated dataset while having more than 100 times low cost. This can be an important gateway to reduce the huge cost of data annotation.

Viswanathan et al. have introduced an impressive method for utilizing existing data and LLM resources with low cost and computation power. This approach can be proven beneficial and boost the usage of LLMs in more diverse and low-resourceful tasks. However, this approach has some scope for improvement. All of the experiments have been conducted using gpt-3.5-turbo API, which is closed-source and paid. The complexity of low-resource languages has not been addressed properly. The low performance in languages other than English can be observed in code generation tasks. Despite having these limitations, Prompt2Model has highlighted the possibility of using LLMs with low cost and computation power. We believe that the research questions initiated by Prompt2Model can lead us to more easy-to-access LLMs shortly.

Note: Technical details have been skipped to keep it simple. Please read the paper for more details.

Reference

  1. Viswanathan, C. Zhao, A. Bertsch, T. Wu, and G. Neubig, “Prompt2Model: Generating Deployable Models from Natural Language Instructions.” arXiv, Aug. 23, 2023. Accessed: Jan. 08, 2024. [Online]. Available: http://arxiv.org/abs/2308.12261

MidJourney V6: Exploring the Boundaries of AI Artistry

MidJourney is an innovative platform that pioneers AI-powered creativity, pushing the boundaries of image generation and artistic expression. The upcoming release of Version 6 (V6) and the new Alpha website have the community eagerly anticipating the future.

During a recent office hours session, MidJourney shared updates on the Alpha website, mobile compatibility, V6, and enhanced features. The Alpha website showcases promising functionality in image creation and is accessible to exclusive 10K Club members, with plans for a gradual rollout to broader audiences.

MidJourney is working on enhancing the mobile experience with the development of a probable web app for Android and a native iOS app, and welcomes collaboration from individuals skilled in Native Android development.

V6 promises to revolutionize the image generation process with natural language inputs for a more intuitive user experience and enhanced features, including updated describe feature, style anchoring, and a next-gen style tuner. However, V6 may be more expensive per image due to optimizations present in its predecessor, V5.

MidJourney’s commitment to pushing the boundaries of AI artistry is evident in its ongoing developments and community engagement. With V6 on the horizon and the Alpha website paving the way for new possibilities, the MidJourney community is on the brink of a new era in AI artistry.

Data & Design Lab Collaboration with Liberation War Museum for Smart Education Platform and Digital Content for National Curriculum

Bangladesh is striving to close the gap in technology and education, and innovative approaches to learning are more crucial than ever. With recent shifts in the national curriculum, there’s a growing recognition of the need to integrate advanced technologies and better digital content into educational practices. At the forefront of this transformative wave is the Smart Learning Platform project, which has embarked on an ambitious project with the Liberation War Museum (LWM) to redefine the educational landscape. This initiative not only aligns with the country’s educational reforms but pushes the boundaries further by integrating Artificial Intelligence (AI) into learning, setting a new precedent in the fusion of technology and education.

The project’s inception was rooted in the idea of using digital media to bring the Liberation War of Bangladesh closer to the young minds of the country. Targeting Class 6 and Class 7 students under the Bangladesh National Curriculum, the initial objective was to develop engaging video content that would showcase the museum’s wealth of information, artifacts, and historical documents. The aim was to ignite a spark of interest and understanding about this pivotal period in the nation’s history.

When DnD Lab was handed over this project, we envisioned a more interactive and immersive learning experience. Our proposal to integrate an AI system into the learning platform marked a significant leap from the conventional methods of history education. This innovative approach was designed to transform passive content consumption into an interactive, engaging process.

The integration of AI was multifaceted. One aspect involved interactive question formulation following the QuBAN (Query-Based Access to Neurons) method and multiple-choice questions embedded within the content, allowing students to actively engage and interact as they learned. This feature was not just about testing knowledge but about encouraging students to think critically and seek answers, with AI guiding them subtly rather than providing outright solutions.

Beyond individual learning, DnD Lab’s system encouraged group activities and home-based participation, expanding the learning environment beyond the traditional classroom. An AI chatbot was also introduced, serving as a virtual guide and assistant, helping students navigate through their educational journey.

Teachers were not left behind in this digital revolution. DnD Lab equipped educators with AI-driven tools for monitoring student performance and identifying areas for improvement. The AI system provided actionable insights and recommendations, enabling teachers to optimize their teaching strategies. Moreover, the system’s analytics capabilities allowed for a comprehensive overview of class performance, simplifying the assessment process.

Gamification elements were also a critical component of this project. The introduction of symbolic scoring systems, leaderboards, and rankings injected a sense of competition and achievement into the learning process, motivating students through playful yet educational challenges.

The project gained significant public attention following a press conference organized by LWM, where it was highlighted in major Bangladeshi media outlets like Daily Star, Prothom Alo, and Ekattor News. The initiative’s impact was further amplified through a Facebook live presentation on the Bangladesh Liberation War Museum’s page.

The project was brought to life by a team of dedicated individuals, including LWM Trustee Dr. Sarwar Ali, Trustee Mofidul Haque, Trustee, and Member Secretary Sara Zaker, along Dr. Moinul Islam Zaber, a Professor from the Computer Science and Engineering Department of Dhaka University, and Senior Lecturer Md. Abu Sayed from Independent University Bangladesh. The DnD Lab Team comprises students from various backgrounds including Khandoker Ashik Uz Zaman (Researcher and Developer), Ahsan Habib Nahid (Web Developer), Amit Roy (Content Creator), Md. Mehedi Hasan (AI Developer), and Abir Chakraborty Partha (AI Developer), played a pivotal role under the guidance of Dr. Moinul Islam Zaber and Md. Abu Sayed.

The Liberation War Museum’s Smart Learning Platform project by DnD Lab stands as a beacon of innovative educational practices in Bangladesh. It exemplifies how technology, particularly AI, can be harnessed to make learning history a more engaging, interactive, and effective process. This initiative not only honors the past but also paves the way for a future where education is enriched through the power of technology.