Sunday, August 6, 2023

The landscape of generative AI tools and developments

  

The landscape of generative AI tools and developments

AI is moving fast: New tools and developments based on generative AI models appear almost daily in today's AI landscape. The most important sub-areas are currently Image Generation and Natural Language Generation. Applications in the field of generative AI are based on so-called foundation models. Simply put, these are large AI models that have been trained with vast amounts of data and then further specified for concrete applications by means of fine-tuning.

The landscape of generative AI tools and developments AI is moving fast: New tools and developments based on generative AI models appear almost daily in today's AI landscape. The most important sub-areas are currently Image Generation and Natural Language Generation. Applications in the field of generative AI are based on so-called foundation models. Simply put, these are large AI models that have been trained with vast amounts of data and then further specified for concrete applications by means of fine-tuning.   Graphic Gen AI Model and Application Layer cf. https://www.sequoiacap.com/article/generative-ai-a-creative-new-world/ Text generation The understanding, summary and Generation of speech by means of AI is based on so-called LLMs (Large Language Models). This are among the most important large AI models and represent an important advance in the field of AI. LLMs impressively demonstrate what generative AI can already do today and, above all, how we can interact with it. Texts generated by LLMs are hardly distinguishable from human-written texts, but can contain incorrect information due to the very generic training. Moreover, they have not yet reached the level of texts written by professional writers or scientific papers.  They are currently used mainly for brainstorming, first drafts, notes and marketing content. It remains to be seen to what extent the output of LLMs will continue to improve and gain in quality through more up-to-date models, fine-tuning, feedback and more application-specific training.  Code generation Code generation and completion refers to the creation of entire blocks of code or individual lines of code using AI. Because programming languages can be interpreted analogously to natural language, models for code generation are also based on LLMs. This offers the advantage of being able to specify what the function of the code should be by means of an instruction (= prompt), without having to familiarise oneself with code libraries or packages.  ChatGPT Use Cases in the company     Image generation Text-to-image models are able to create images from text input. The style, angle of view, type of image and size can be modified as desired. Models such as Midjourney, StableDiffusion and others can be used to create an image in the style of Picasso that does not exist, to create breathtaking artwork or to generate photorealistic images of people.   Video generation With Make-A-Video from Meta and Microsoft's X-Clip, models are slowly emerging that are even able to generate videos artificially. However, these are currently limited by the high computing power required for this. Because the generation of images is already computationally expensive, it requires immense computing power for videos (at least 24 images per second). However, more efficient models and the wider availability of large GPU clusters will make this bottleneck a thing of the past in the future.  Chatbots Previously known as rule-based systems that are supposed to be able to answer customer questions, for example, these models have now evolved into knowledge repositories with context-based communication capabilities: With ChatGPT, OpenAI has succeeded in creating a chatbot that is able to conduct an entire conversation about a topic, accept suggestions for improvement and refer to past points in the conversation. This makes conversations with ChatGPT very intuitive.  Chatbots - Compactly explained - Header with the logo of OpenAI's ChatGPT and the company logo of Alexander Thamm GmbH in green Speech synthesis Speech recognition has been around for some time ("Hey Siri"), but truly usable speech generation has only recently emerged. For high-end applications such as films and podcasts, the bar for unique human speech quality that does not sound mechanical is quite high. Nevertheless, there are already models like Microsoft's "VALL-E" that are able to synthesise the speech of a specific human being - using only a few speech samples. Because speech is a very distinctive feature of humans and it has been very difficult to fake it so far, applications here can unfortunately also cause considerable damage: With deepfakes, for example, the voices of well-known personalities can be simulated and content can be pronounced that would never be possible in reality.  3D modelling Product design is a complicated process, often the beginning alone is difficult and potential optimisations are complicated to implement. Generative models such as DreamFusion are able to create any shape imaginable and thus accelerate and improve this iterative process. The model converts text into a 3D model - this can be useful for brainstorming, finding new possible shapes or optimising components, for example. 3D generators are based on text-to-image generators and are still in the early stages of development, but can provide promising output in the future.  Further applications In the audio, gaming and music sector, too, new models are constantly emerging that are capable of designing games, generating synthetic music and much more. So far, however, AI-generated songs sound rather unusual and weird - they (still) lack "soul".  Another important area of generative AI models is research. Generative models are also playing an increasingly important role in the discovery of new drugs. The AlphaFold model from the research company DeepMind has already proven that generative AI is capable of answering research questions. AI models are currently being developed in a wide variety of fields that can help researchers answer important scientific questions and thus could have productive benefits for all of us.  Legal framework for the use of generative AI Generally speaking, generative models are So-called "General Purpose AI. In other words, AI that is not only developed for a certain limited purpose, but can take on many different tasks. Because these models have only been around for a few years, there has so far been no Still no EU-wide legal regulation on the use of these models achieved. However, the EU is about to change this with the AI Act: The AI Act provides for making general purpose AI (GP-AI) safer by means of various requirements for GP-AI and preventing the use of this technology for unlawful purposes. As things stand, however, the requirements placed on GP-AI systems can hardly be met.  EU AI Act Data protection For many generative models, it remains doubtful whether they comply with the General Data Protection Regulation (GDPR) in Europe. The almost unbelievable knowledge of large models such as ChatGPT is based on public content such as books, articles, websites or even posts on social media. Therefore, a lot of data also comes from social media users themselves. In itself, this fact is not problematic, but at no point is the consent of the creators of this content obtained. Whether personal data also flows into generative models remains questionable.  Copyright Text-to-image generators in particular benefit from the large number of works that have been distributed on the internet. However, the copyrights of these works remain in a grey area. If, for example, I generate a work of art "in the style of Picasso", it remains unclear whether and to what extent this can be reconciled with the copyrights of Picasso's works. The current approach relies on the fact that the "prompt", i.e. the input text, represents the creative achievement here and can therefore be protected by copyright - but not the generated image or the training data used for the model.  Deepfakes If our human works such as texts, pictures, videos and sound recordings fall into the wrong hands, they can also be used for a lot of mischief: Time and again, so-called "deepfakes" circulate on the internet, imitating the speech, facial expressions, gestures and appearance of celebrities, politicians and public figures. These are now deceptively real and can only be proven to be fakes by experts. With the expected further improvement of generative models, this can have worrying consequences - in the private, political and economic sense.  Outlook Generative AI table with an overview of different applications Source: https://www.sequoiacap.com/article/generative-ai-a-creative-new-world/ We are excited about what the future of generative models will bring. One thing is already clear: the potential areas of application and use cases are almost infinite
Graphic Gen AI Model and Application Layer
cf. https://www.sequoiacap.com/article/generative-ai-a-creative-new-world/

Text generation

The understanding, summary and Generation of speech by means of AI is based on so-called LLMs (Large Language Models). This are among the most important large AI models and represent an important advance in the field of AI. LLMs impressively demonstrate what generative AI can already do today and, above all, how we can interact with it. Texts generated by LLMs are hardly distinguishable from human-written texts, but can contain incorrect information due to the very generic training. Moreover, they have not yet reached the level of texts written by professional writers or scientific papers.

They are currently used mainly for brainstorming, first drafts, notes and marketing content. It remains to be seen to what extent the output of LLMs will continue to improve and gain in quality through more up-to-date models, fine-tuning, feedback and more application-specific training.

Code generation

Code generation and completion refers to the creation of entire blocks of code or individual lines of code using AI. Because programming languages can be interpreted analogously to natural language, models for code generation are also based on LLMs. This offers the advantage of being able to specify what the function of the code should be by means of an instruction (= prompt), without having to familiarise oneself with code libraries or packages.

ChatGPT Use Cases in the company



Image generation

Text-to-image models are able to create images from text input. The style, angle of view, type of image and size can be modified as desired. Models such as Midjourney, StableDiffusion and others can be used to create an image in the style of Picasso that does not exist, to create breathtaking artwork or to generate photorealistic images of people.

The landscape of generative AI tools and developments AI is moving fast: New tools and developments based on generative AI models appear almost daily in today's AI landscape. The most important sub-areas are currently Image Generation and Natural Language Generation. Applications in the field of generative AI are based on so-called foundation models. Simply put, these are large AI models that have been trained with vast amounts of data and then further specified for concrete applications by means of fine-tuning.   Graphic Gen AI Model and Application Layer cf. https://www.sequoiacap.com/article/generative-ai-a-creative-new-world/ Text generation The understanding, summary and Generation of speech by means of AI is based on so-called LLMs (Large Language Models). This are among the most important large AI models and represent an important advance in the field of AI. LLMs impressively demonstrate what generative AI can already do today and, above all, how we can interact with it. Texts generated by LLMs are hardly distinguishable from human-written texts, but can contain incorrect information due to the very generic training. Moreover, they have not yet reached the level of texts written by professional writers or scientific papers.  They are currently used mainly for brainstorming, first drafts, notes and marketing content. It remains to be seen to what extent the output of LLMs will continue to improve and gain in quality through more up-to-date models, fine-tuning, feedback and more application-specific training.  Code generation Code generation and completion refers to the creation of entire blocks of code or individual lines of code using AI. Because programming languages can be interpreted analogously to natural language, models for code generation are also based on LLMs. This offers the advantage of being able to specify what the function of the code should be by means of an instruction (= prompt), without having to familiarise oneself with code libraries or packages.  ChatGPT Use Cases in the company     Image generation Text-to-image models are able to create images from text input. The style, angle of view, type of image and size can be modified as desired. Models such as Midjourney, StableDiffusion and others can be used to create an image in the style of Picasso that does not exist, to create breathtaking artwork or to generate photorealistic images of people.   Video generation With Make-A-Video from Meta and Microsoft's X-Clip, models are slowly emerging that are even able to generate videos artificially. However, these are currently limited by the high computing power required for this. Because the generation of images is already computationally expensive, it requires immense computing power for videos (at least 24 images per second). However, more efficient models and the wider availability of large GPU clusters will make this bottleneck a thing of the past in the future.  Chatbots Previously known as rule-based systems that are supposed to be able to answer customer questions, for example, these models have now evolved into knowledge repositories with context-based communication capabilities: With ChatGPT, OpenAI has succeeded in creating a chatbot that is able to conduct an entire conversation about a topic, accept suggestions for improvement and refer to past points in the conversation. This makes conversations with ChatGPT very intuitive.  Chatbots - Compactly explained - Header with the logo of OpenAI's ChatGPT and the company logo of Alexander Thamm GmbH in green Speech synthesis Speech recognition has been around for some time ("Hey Siri"), but truly usable speech generation has only recently emerged. For high-end applications such as films and podcasts, the bar for unique human speech quality that does not sound mechanical is quite high. Nevertheless, there are already models like Microsoft's "VALL-E" that are able to synthesise the speech of a specific human being - using only a few speech samples. Because speech is a very distinctive feature of humans and it has been very difficult to fake it so far, applications here can unfortunately also cause considerable damage: With deepfakes, for example, the voices of well-known personalities can be simulated and content can be pronounced that would never be possible in reality.  3D modelling Product design is a complicated process, often the beginning alone is difficult and potential optimisations are complicated to implement. Generative models such as DreamFusion are able to create any shape imaginable and thus accelerate and improve this iterative process. The model converts text into a 3D model - this can be useful for brainstorming, finding new possible shapes or optimising components, for example. 3D generators are based on text-to-image generators and are still in the early stages of development, but can provide promising output in the future.  Further applications In the audio, gaming and music sector, too, new models are constantly emerging that are capable of designing games, generating synthetic music and much more. So far, however, AI-generated songs sound rather unusual and weird - they (still) lack "soul".  Another important area of generative AI models is research. Generative models are also playing an increasingly important role in the discovery of new drugs. The AlphaFold model from the research company DeepMind has already proven that generative AI is capable of answering research questions. AI models are currently being developed in a wide variety of fields that can help researchers answer important scientific questions and thus could have productive benefits for all of us.  Legal framework for the use of generative AI Generally speaking, generative models are So-called "General Purpose AI. In other words, AI that is not only developed for a certain limited purpose, but can take on many different tasks. Because these models have only been around for a few years, there has so far been no Still no EU-wide legal regulation on the use of these models achieved. However, the EU is about to change this with the AI Act: The AI Act provides for making general purpose AI (GP-AI) safer by means of various requirements for GP-AI and preventing the use of this technology for unlawful purposes. As things stand, however, the requirements placed on GP-AI systems can hardly be met.  EU AI Act Data protection For many generative models, it remains doubtful whether they comply with the General Data Protection Regulation (GDPR) in Europe. The almost unbelievable knowledge of large models such as ChatGPT is based on public content such as books, articles, websites or even posts on social media. Therefore, a lot of data also comes from social media users themselves. In itself, this fact is not problematic, but at no point is the consent of the creators of this content obtained. Whether personal data also flows into generative models remains questionable.  Copyright Text-to-image generators in particular benefit from the large number of works that have been distributed on the internet. However, the copyrights of these works remain in a grey area. If, for example, I generate a work of art "in the style of Picasso", it remains unclear whether and to what extent this can be reconciled with the copyrights of Picasso's works. The current approach relies on the fact that the "prompt", i.e. the input text, represents the creative achievement here and can therefore be protected by copyright - but not the generated image or the training data used for the model.  Deepfakes If our human works such as texts, pictures, videos and sound recordings fall into the wrong hands, they can also be used for a lot of mischief: Time and again, so-called "deepfakes" circulate on the internet, imitating the speech, facial expressions, gestures and appearance of celebrities, politicians and public figures. These are now deceptively real and can only be proven to be fakes by experts. With the expected further improvement of generative models, this can have worrying consequences - in the private, political and economic sense.  Outlook Generative AI table with an overview of different applications Source: https://www.sequoiacap.com/article/generative-ai-a-creative-new-world/ We are excited about what the future of generative models will bring. One thing is already clear: the potential areas of application and use cases are almost infinite

Video generation

With Make-A-Video from Meta and Microsoft's X-Clip, models are slowly emerging that are even able to generate videos artificially. However, these are currently limited by the high computing power required for this. Because the generation of images is already computationally expensive, it requires immense computing power for videos (at least 24 images per second). However, more efficient models and the wider availability of large GPU clusters will make this bottleneck a thing of the past in the future.

Chatbots

Previously known as rule-based systems that are supposed to be able to answer customer questions, for example, these models have now evolved into knowledge repositories with context-based communication capabilities: With ChatGPT, OpenAI has succeeded in creating a chatbot that is able to conduct an entire conversation about a topic, accept suggestions for improvement and refer to past points in the conversation. This makes conversations with ChatGPT very intuitive.

Chatbots - Compactly explained - Header with the logo of OpenAI's ChatGPT and the company logo of Alexander Thamm GmbH in green

Speech synthesis

Speech recognition has been around for some time ("Hey Siri"), but truly usable speech generation has only recently emerged. For high-end applications such as films and podcasts, the bar for unique human speech quality that does not sound mechanical is quite high. Nevertheless, there are already models like Microsoft's "VALL-E" that are able to synthesise the speech of a specific human being - using only a few speech samples. Because speech is a very distinctive feature of humans and it has been very difficult to fake it so far, applications here can unfortunately also cause considerable damage: With deepfakes, for example, the voices of well-known personalities can be simulated and content can be pronounced that would never be possible in reality.

3D modelling

Product design is a complicated process, often the beginning alone is difficult and potential optimisations are complicated to implement. Generative models such as DreamFusion are able to create any shape imaginable and thus accelerate and improve this iterative process. The model converts text into a 3D model - this can be useful for brainstorming, finding new possible shapes or optimising components, for example. 3D generators are based on text-to-image generators and are still in the early stages of development, but can provide promising output in the future.

Further applications

In the audio, gaming and music sector, too, new models are constantly emerging that are capable of designing games, generating synthetic music and much more. So far, however, AI-generated songs sound rather unusual and weird - they (still) lack "soul".

Another important area of generative AI models is research. Generative models are also playing an increasingly important role in the discovery of new drugs. The AlphaFold model from the research company DeepMind has already proven that generative AI is capable of answering research questions. AI models are currently being developed in a wide variety of fields that can help researchers answer important scientific questions and thus could have productive benefits for all of us.

Legal framework for the use of generative AI

Generally speaking, generative models are So-called "General Purpose AI. In other words, AI that is not only developed for a certain limited purpose, but can take on many different tasks. Because these models have only been around for a few years, there has so far been no Still no EU-wide legal regulation on the use of these models achieved. However, the EU is about to change this with the AI Act: The AI Act provides for making general purpose AI (GP-AI) safer by means of various requirements for GP-AI and preventing the use of this technology for unlawful purposes. As things stand, however, the requirements placed on GP-AI systems can hardly be met.

EU AI Act

Data protection

For many generative models, it remains doubtful whether they comply with the General Data Protection Regulation (GDPR) in Europe. The almost unbelievable knowledge of large models such as ChatGPT is based on public content such as books, articles, websites or even posts on social media. Therefore, a lot of data also comes from social media users themselves. In itself, this fact is not problematic, but at no point is the consent of the creators of this content obtained. Whether personal data also flows into generative models remains questionable.

Text-to-image generators in particular benefit from the large number of works that have been distributed on the internet. However, the copyrights of these works remain in a grey area. If, for example, I generate a work of art "in the style of Picasso", it remains unclear whether and to what extent this can be reconciled with the copyrights of Picasso's works. The current approach relies on the fact that the "prompt", i.e. the input text, represents the creative achievement here and can therefore be protected by copyright - but not the generated image or the training data used for the model.

Deepfakes

If our human works such as texts, pictures, videos and sound recordings fall into the wrong hands, they can also be used for a lot of mischief: Time and again, so-called "deepfakes" circulate on the internet, imitating the speech, facial expressions, gestures and appearance of celebrities, politicians and public figures. These are now deceptively real and can only be proven to be fakes by experts. With the expected further improvement of generative models, this can have worrying consequences - in the private, political and economic sense.

Outlook

Generative AI table with an overview of different applications
Source: https://www.sequoiacap.com/article/generative-ai-a-creative-new-world/

We are excited about what the future of generative models will bring. One thing is already clear: the potential areas of application and use cases are almost infinite