In this work, we systematically study the problem of personalized text-to-image generation, where the output image is expected to portray information about specific human subjects. E.g., generating images of oneself appearing at imaginative places, interacting with various items, or engaging in fictional activities. To this end, we focus on text-to-image systems that input a single image of an individual to ground the generation process along with text describing the desired visual context. Our first contribution is to fill the literature gap by curating high-quality, appropriate data for this task. Namely, we introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available. Having established Stellar to promote cross-systems fine-grained comparisons further, we introduce a rigorous ensemble of specialized metrics that highlight and disentangle fundamental properties such systems should obey. Besides being intuitive, our new metrics correlate significantly more strongly with human judgment than currently used metrics on this task. Last but not least, drawing inspiration from the recent works of ELITE and SDXL, we derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA. For more information, please visit our project’s website: https://stellar-gen-ai.github.io.
EUSIPCO
Generating Salient Scene Graphs with Weak Language Supervision
Scene Graph Generation (SGG), given an image, is the task of building directed graphs where edges represent predicted triplets. Most SGG models struggle to identify important and descriptive relations in images flooding the graph with triplets like . This is not due to training problems but rather the lack of saliency in fully supervised SGG datasets. Hence, observing that annotators describing an image naturally omit background relations and encode image saliency we (i) introduce a generalized method for training SGG models with weak supervision using image captions, (ii) introduce two variations of the Recall@N metric which can quantify the saliency of SGG models and (iii) perform quantitative and qualitative comparisons with related literature in VG200, where we achieve up to 35 % improvement compared to re-implementation of the SOTA.
2021
EMBC
Assessing vision quality in retinal prosthesis implantees through deep learning: Current progress and improvements by optimizing hardware design parameters and rehabilitation
Alexandros Benetatos, Nikos Melanitis , and Konstantina S Nikita
In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) , 2021
Retinal prosthesis (RP) is used to partially restore vision in patients with degenerative retinal diseases. Assessing the quality of RP-acquired (i.e., prosthetic) vision is needed to evaluate RP impact and prospects. Spatial distortions caused by electrical stimulation of the retina in RP, and the low number of electrodes, have limited the prosthetic vision: patients mostly localize shapes and shadows rather than recognizing objects. We simulate prosthetic vision and evaluate vision on image classification tasks, varying critical hardware parameters: total number and size of electrodes. We also simulate rehabilitation by re-training our models on prosthetic vision images. We find that electrode size has little impact on vision while at least 400 electrodes are needed to sufficiently restore vision (more than 65% classification accuracy on a complex visual task after rehabilitation). Argus II, a currently available implant, produces a low-resolution vision leading to low accuracy (21.3% score after rehabilitation) in complex vision tasks. Rehabilitation produces significant improvements (accuracy improvement of up to 30% on complex tasks, depending on the number of electrodes) in the attained vision, boosting our expectations for RP interventions and motivating the establishment of rehabilitation procedures for RP implantees.