Abstract—Computer Vision and Machine Learning happen to be one of the hot topics of several domains
Abstract—Computer Vision and Machine Learning happen to be one of the hot topics of several domains. It is not only restricted to the Computer Science industry. The fashion industry is one such domain that has employed assorted algorithms to attend to the needs of everything from the creation of designs to user suggestions regarding the kind of apparels that would suit their needs. Our focus delves into the benefits which such algorithms add to the work of a couturier. The vitality of this application deals with the broad perspective of convenience with the help of technology. Which in layman terms means, that this application would help designers to gather fashion pieces of similar concepts with convenience and without much human interference. This paper sheds some light on the several generative algorithms that can be employed in this industry. Index Terms—computer vision, machine learning, fashion, GANs, StackGANs, generative images, recommendation A lot has been said about the nature and use of technology in areas where automation could not have been imagined. Most of such domains started out with manual work synthesizing into a solid product. Fashion and apparel happened to be one such discipline in which sketches made by hand have commenced the creation of a fashion piece. The basis of automation in the sphere of Machine Learning lies in modeling real-world entities and that is exactly what formed the foundation of research in the a? la mode business. Fashion is much more than its pieces and a prevailing style. It encaptures the manifestation of cultural ideas of various societies of the world into a few strands of Haute couture. In such an area, automation can act as an aid to applications which are closely associated with this trend-setting domain. Focusing on the generative algorithms that provide an insight into a commercial form of application in the apparel business, we have the following topics in concern: • Fashion Dataset • Fashion Synthesis and Structural Coherence • StackGANs • Visually-aware Recommendation • AI in the Commercial Fashion Industry • Style Generation To commence our debate on core ML implementations, we must mention the dire need of a good dataset. An adequate dataset not only enhances the result but also provides an effective way to train and test a particular model, providing a platform for validation. The following are some of the several datasets that allow us to proceed with the above goal: • Fashion MNSIT: This dataset consists of 60,000 training images and 10,000 testing images. The dimensions are 28×28 and each image is associated with a label from 10 classes. The quality is low and this dataset consists of only grayscale images. This collection is a twist on the original MNIST dataset which consists of images of handwritten numbers. 1 • Fashion-Gen: This collection comprises of 293,008 high definition training images and clothes and accessories accompanied by detailed design descriptions 2. The testing data is around 32,528 images. They are full HD images, photographed in several angles and belonging to one of 48 categories. This dataset was created for the sole purpose of finding an appropriate solution for text to image translation. There is also some metadata provided with each item, which talks about recommended matched items, the fashion season, designer and the brand. • DeepFashion: A large-scale clothes dataset with comprehensive annotations. It contains over 800,000 images, which are richly annotated with massive attributes, clothing landmarks, and correspondence of images taken under different scenarios including store, street snapshot, and consumer. 3 • CelebA: This is a large scale face attributes dataset consisting of 200K celebrity images each with 40 attribute annotations. The focus is on the poses and background clutter. It finds its use in attribute recognition, face detection, and landmark localization. 4 A. Fashion Synthesis and Structural Coherence Structural coherence is basically an effort to focus on the build of the fashion croquis and how different kinds of apparels can be effectively mapped onto the croquis. The approach utilized in this paper initially deals with generative adversarial networks. This appeal facilitates a method in which the generation of new apparels takes place, given a certain language description. An input image is provided, which contains a fashion figure with a particular style of apparels on and associated with that is a language annotation that simply describes that desired image- a style note. The vanilla GANs seem to not deliver up to the mark and hence global coherence is compromised and so is the quality of the image. Hence, we bank on a two-stage GAN architecture which provides an optimal solution via decompartmentalization of stages. There are three main challenges that act as an obstacle for such an approach: • Producing an efficient and adequate image for the language description chosen • Retention of the real image provided as input, in a manner wherein the pose of the croquis is maintained • When a style is chosen, the build of the croquis should not misalign with the desired style Optimal solution: An input is provided along with a sentence describing the desired output, the proposed model initially synthesizes a segmentation map using a generator at the end of the first stage. Utilizing another GAN, a new image is rendered using the segmentation map from the previous stage. This is the two-stage GAN approach, known as the FashionGAN in this case. Given an original wearers input photo (left) and different textual descriptions (second column), our model generates new outfits onto the photograph (right three columns) while preserving the pose and body shape of the wearer 5 B. StackGANs The two main challenges faced by the modern techniques of text to image synthesis are the generation of high-quality images from text descriptions and for each generated image to contain the necessary details and vivid object parts. This is where Stacked Generative Adversarial Networks (StackGANs) step in to solve the above challenges. 1) Brief Working: In a nutshell, StackGANs render photorealistic images of a 256×256 size which is conditioned on the given text descriptions. The approach opted for is simply breaking down a bigger problem into more solvable subproblems. In this case that would mean, rendering high-quality images with intricate details is broken down into two separate stages . In brief, we have Stage-1 GAN, which focuses on sketching out a low resolution image which contains just the shape and certain primitive colours. Stage-1 GAN delivers the barebones of the image with less structure and details. The yielded image is taken as input for the Stage-2 GAN. This stage facilitates the addition of details that were excluded in Stage-1 and works on the refinement of the generated image. The output of the Stage-2 GAN is an image which does not compromise on the quality and enhancement of features. While StackGAN manages to solve the above issue, there is also another problem at hand. This stumbling block deals with improving the diversity of the synthesized images, which is solved by introducing a Conditioning Augmentation Technique. 2) Conditioning Augmentation Technique: The text description t is passed as input into an encoder to yield an embedded text ?t which in turn is used as input to the stage-1 generator G0 . But there is a difficulty that arises since, the requirement for the space for this process is a high dimensional one. Which isnt too convenient as there is a limitation on the amount of data. Hence, the use of conditional augmentation serves its purpose here. It produces more conditioning variables in a vectorised form c?. For upsampling and downsampling of the latent variables c?, we use Gaussian distribution to represent real-valued random variables whose distributions are not known. Which is N (µ(?t ), ?(?t )) where µ(?t ) is the mean and ?(?t ) is the covariance matrix, functions of the embedded text ?t . The main advantage of conditional augmentation technique is the fact that it yields more text-image pairs that increases the durability. Overfitting is another challenge that this technique needs to tackle. To make the approach much more smooth, a regularization process must be employed. Here, that is offered by the Kullback-Leibler divergence which is applied between the Gaussian distribution and the conditioning Gaussian distribution. This introduces randomness. The randomness encourages the translation of text to image as the same text might relate to images of diverse poses and appearance.6 The purpose of a recommender system is to provide personalized suggestions to users, based on their history and inferring their preferences and taste. There were certain challenges associated with this traditional system – a long list of items, cold start, evolving fashion. To deal with these challenges, recently there have been efforts to come up with recommender systems that are ‘visually aware’. This model can be used for both personalized recommendation and design. Personalized recommendation is achieved by using a visually aware recommender based on Siamese CNNs; generation is achieved by using a Generative Adversarial Net to synthesize new clothing items in the user’s personal style. (Icons made by Madebyoliver, Roundicons and Freepik from www.flaticon.com) 7 Main ideas: • Recommender Systems: Matrix Factorization (MF) methods relate users and items. Point-wise and pairwise methods have recently adapted MF. BPR with MF as the underlying predictor has been extended to incorporate visual frameworks and is hence, the framework they build on. • Visually-aware Recommender Systems: The users’ rating dimensions are modeled as visual signals. Extension of BPR-MF to incorporate visual dimensions; showing that better performance can be obtained by using ‘end-to-end’ learning approach. • Fashion and Clothing style: Categorizing images belonging to a particular style and assessing items for compatibility. • Siamese Networks and Comparative Image Models: This type of architecture has been applied to discriminative tasks (face verification), as well as, comparative tasks (modeling preference judgements between images). • Image Generation and GANs: The generated images are trained to look ‘realistic’. These systems can be conditioned on additional inputs to achieve certain output characteristics. They follow an approach based on activation maximization to generate images that best suit a user’s taste. D. AI in the Commercial Fashion Industry The fashion industry is being revolutionized by the advent of ML and AI. We found some interesting AI ventures in this field that have been huge successes and some that are still in their preliminary stages but have the potential to do excellently. 1) Alibaba’s FashionAI: The ability to build complete looks from the clothes in the store and the availability of an app to work on these styles on the go is a very alluring feature for the tech-driven audience. The highly specific data that FashionAIs intelligent locks provide can help the e-tailer track fine-tuned customer preference data, such as which products customers pick up and look at the most.8 Intelligent Garment Tags: Products in the FashionAI store have special tracking tags featuring radio-frequency identification (RFID), gyro-sensors, and low-energy Bluetooth chips. This technology allows each lock on the garment to carry specialized information about the item its attached to, such as its color and size. Smart Mirrors: These devices have touch screens that use information relayed by the intelligent locks on each product to automatically display information for the items customers are interacting with. They can also suggest other apparel to complement the items customers are picking up, help shoppers find where products are located, add garments to a virtual shopping cart. Omnichannel Integration: Alibaba intends to add a new Virtual Wardrobe feature to its Mobile Taobao app. This would allow customers to view the clothes they tried on in the store along with recommendations from other merchants on Alibaba’s shopping sites for more items that would complement those looks. 2) Amazon’s AI Fashion Designer and Echo Look: A group of Amazon researchers based in Israel developed machine learning that, by analyzing just a few labels attached to images can deduce whether a particular look can be considered stylish. The software could possibly provide fashion feedback or recommendations for adjustments. An Amazon team at Lab126 which is a research center based in San Francisco has developed an algorithm that learns about a particular style of fashion from images and can then generate new items in similar styles from scratch – basically, a simple AI fashion designer. It makes use of GANs.9 Amazons Echo Look camera analyzes your clothing style and makes fashionable recommendations through machine learning. The devices marquee feature is Style Check, which will review photos of two different outfits to provide a second opinion.10 E. Style Generation This paper11, talks about applying artificial intelligence to automatically generate fashion style images. Given a basic clothing image and a fashion style image (any kind of design print), generating a clothing image with the certain style in real time with a neural fashion style generator. To achieve this, they propose an end-to-end feed-forward neural network which consists of a fashion style generator and a discriminator. The combined global and patch based style and content losses calculated by the discriminator alternatively, backpropagate the generator network and optimize it. The global optimization stage preserves the clothing structure and design while the local optimization stage preserves the detailed style pattern. Fashion style generator framework overview. The input X consists of a set of clothing patches X(1) and full clothing images X(2). The system consists of two components: an image transformation network G served as fashion style generator, and a discriminator network D calculates both global and patch based content and style losses. G is a convolutional encoder decoder network parameterized by weights ?. Six generated shirts with different styles by our method are shown as examples. (We highly recommend to zoom in all the figures with color version for more details.) Existing neural style transfer currently has two approaches: global and patch. Global (full image) based methods preserve the global structure of content images, however, the detailed structure of the style images are not well blended. Patch based approaches such as deep Markovian models capture the statistics of local patches and assemble them to high-resolution images. There is high detail preservation, however, additional guidance is needed to reproduce the global structure. To combat these challenges, they decided to go with an end-to-end feed-forward neural network of fashion style generation that combines the benefits of global and path based methods. Synthetic fashion style images by 5 compared methods NeuralST, MRFCNN, FeedS, MGAN and Ours. The fist left column shows the input style images “wave” and “bear”. The second left column shows four input content images. For MGAN and Ours, we enlarge the regions in red frames to show more details.11 When comparing feed-forward based methods (FeedS, MGAN and this paper’s approach), we note that MGAN and this approach better preserve the detailed textures in the style images, compared with global based FeedS. In the first row of MGAN, the areas in the red frames are not well synthesized. In this paper’s method, these areas are better blended with style patterns. It shows the effectiveness of considering both global and local characteristics in their method. NeuralST and MRFCNN are not feed-forward based networks. Generally, besides the speed, they have similar observations. In MRFCNN, although the generated images preserve the textures, they may lose the original global structures. NeuralST and MRFCNN are computationally expensive since each step of the optimization requires forward and backward passes through the pretrained network. With the feed-forward network, since they do need to do the backpropagation in the test stage, the test speed is hundreds of times faster. In our paper, we discuss various applications of machine learning and artificial intelligence in the fashion industry. We observe that research is state-of-the-art and there is potential to revolutionize the industry with these technologies. An application of these approaches would be to produce a single tool or framework which would be a game-changer for designers and artists. ML in fashion pushes the creative bounds as it comes up with combinations of designs that the human mind would traditionally not think of. Commercialization of these technologies and the development of a polished product is expected. We would like to thank our guide- Ms. Chandravva Hebbi, who has been instrumental in helping us gain clarity with our idea and form a perspicuous understanding of the discussed applications. We would also like to express our gratitude to PES University for giving us the enriching opportunity to research on this topic and write this paper.