Publications
2022
Depthwise spatio-temporal STFT convolutional neural networks for human action recognition
IEEE Trans. Pattern Analysis and Machine Intelligence, 2022
URLQuantifying Societal Bias Amplification in Image Captioning
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022
Optimal Correction Cost for Object Detection Evaluation
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022
AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022
Anonymous identity sampling and reusable synthesis for sensitive face camouflage
Journal of Electronic Imaging, Mar. 2022
Integration of gesture generation system using gesture library with DIY robot design kit
Proc. IEEE/SICE International Symposium on System Integration (SII), Jan. 2022
2021
The semantic typology of visually grounded paraphrases
Computer Vision and Image Understanding, Dec. 2021
Transferring domain-agnostic knowledge in video question answering
Proc. British Machine Vision Conference (BMVC), Nov. 2021
GCNBoost: Artwork Classificationby Label Propagation Through a Knowledge Graph
Proc. ACM International Conference on Multimedia Retrieval (ICMR), Nov. 2021
Image Retrieval by Hierarchy-aware Deep Hashing Based on Multi-task Learning
Proc. ACM International Conference on Multimedia Retrieval (ICMR), Nov. 2021
SCOUTER: Slot attention-based classifier for explainable image recognition
Proc. IEEE/CVF International Conference on Computer Vision (ICCV), Nov. 2021
Built year prediction from Buddha face with heterogeneous labels
Proc. Workshop on Structuring and Understanding of Multimedia Heritage Contents (SUMAC), Oct. 2021
Explain me the painting: Multi-topic knowledgeable art description generation
Proc. IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2021
Visual question answering with textual representations for images
Proc. IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Oct. 2021
Museum Experience into a Souvenir: Generating Memorable Postcards from Guide Device Behavior Log
Proc. ACM/IEEE Joint Conference on Digital Libraries (JCDL), Set. 2021
PoseRN: A 2D pose refinement network for bias-free multi-view 3D human pose estimation
Proc. International Conference on Image Processing (ICIP), Set. 2021
Learners' efficiency prediction using facial behavior analysis
Proc. International Conference on Image Processing (ICIP), Set. 2021
Attending self-attention: A case study of visually grounded supervision in vision-and-language transformers
Proc. Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, Aug. 2021
機械は世界をどう見ているのか?
第3回 【おウチで】大阪大学ロボットサイエンスカフェAug. 2021
A comparative study of language Transformers for video question answering
Neurocomputing, July 2021
URLWRIME: A new dataset for emotional intensity estimation with subjective and objective annotations
Proc. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), June 2021
MTUNet: Few-shot image classification with visual explanations
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2021
A picture may be worth a hundred words for visual question answering
arXiv preprint arXiv:2106.13445June 2021
The laughing machine: Predicting humor in video
Proc. IEEE Winter Conference on Applications of Computer Vision (WACV), Jan. 2021
Preventing fake information generation against media clone attacks
IEICE Trans. Information and Systems, Jan. 2021
URLDevelopment of a Vertex Finding Algorithm using Recurrent Neural Network
arXiv preprint arXiv:2101.11906Jan. 2021
Understanding the role of scene graphs in visual question answering
arXiv preprint arXiv:2101.05479Jan. 2021
2020
ContextNet: Representation and exploration for painting classification and retrieval in context
International Journal on Multimedia Information Retrieval, Dec. 2020
URLIDSOU at WNUT-2020 Task 2: Identification of informative COVID-19 English tweets
Proc. Workshop on Noisy User-Generated Text (W-NUT), Nov. 2020
URLMatch Them Up: Visually Explainable Few-shot Image Classification
arXiv preprint arXiv:2011.12527Nov. 2020
Grading the Severity of Arteriolosclerosis from Retinal Arterio-venous Crossing Patterns
arXiv preprint arXiv:2011.03772Nov. 2020
Constructing a Visual Relationship Authenticity Dataset
arXiv preprint arXiv:2010.05185Oct. 2020
Uncovering hidden challenges in query-based video moment retrieval
Proc. British Machine Vision Conference (BMVC), Set. 2020
Visually grounded paraphrase identification via gating and phrase localization
Neurocomputing, Set. 2020
URLA dataset and baselines for visual question answering on art
Proc. European Conference on Computer Vision Workshops (VISARTS), Aug. 2020
Privacy sensitive large-margin model for face de-identification
Proc. International Conference on Neural Computing for Advanced Applications (NCAA) , Aug. 2020
URLDemographic Influences on Contemporary Art with Unsupervised Style Embeddings
Proc. European Conference on Computer Vision Workshops (VISARTS), Aug. 2020
Knowledge-based video question answering with unsupervised scene descriptions
Proc. European Conference on Computer Vision (ECCV), Aug. 2020
Joint learning of vessel segmentation and artery/vein classification with post-processing
Proc. Medical Imaging with Deep Learning (MIDL), June 2020
Yoga-82: A new dataset for fine-grained classification of human poses
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2020
URLKnowledge-based visual question answering in videos
Proc. Workshop on Women in Computer Vision, June 2020
A fully automated grading system for the retinal arteriovenous crossing signs using deep neural network
Investigative Ophthalmology \& Visual Science, June 2020
Constructing a public meeting corpus
Proc. Conference on Language Resources and Evaluation (LREC), May 2020
Warmer environments increase implicit mental workload even if learning efficiency is enhanced
Frontiers in Psychology, Apr. 2020
URLToward predicting learners' efficiency for adaptive e-learning
Proc. International Learning Analytics and Knowledge Conference (LAK), Mar. 2020
Video analytics in blended learning: Insights from learner-video interaction patterns
Proc. Workshop on Addressing Drop-Out Rates in Higher Education (ADORE), Mar. 2020
IterNet: Retinal image segmentation utilizing structural redundancy in vessel networks
Proc. IEEE Winter Conference on Applications of Computer Vision (WACV), Mar. 2020
BERT representations for video question answering
Proc. IEEE Winter Conference on Applications of Computer Vision (WACV), Mar. 2020
KnowIT VQA: Answering knowledge-based questions about videos
Proc. AAAI Conference Artificial Intelligence (AAAI), Feb. 2020
3D image reconstruction from multi-focus microscopic images
Proc. Pacific-Rim Symposium on Image and Video Technology (PSIVT), Jan. 2020
URLSpeech-driven face reenactment for a video sequence
ITE Trans. Media Technology and Applications, Jan. 2020
URL2019
Public Meeting Corpus Construction and Content Delivery
じんもんこん2019論文集Dec. 2019
Legal information as a complex network: Improving topic modeling through homophily
Proc. International Conference on Complex Networks and Their Applications, Nov. 2019
URLHuman shape reconstruction with loose clothes from partially observed data by pose specific deformation
Proc. Pacific-Rim Symposium on Image and Video Technology (PSIVT), Nov. 2019
Adaptive gating mechanism for identifying visually grounded paraphrases
Proc. Multi-Discipline Approach for Learning Concepts, Oct. 2019
Historical and modern features for Buddha statue classification
Proc. Workshop on Structuring and Understanding of Multimedia HeritAge Contents, Oct. 2019
BUDA.ART: A multimodal content-based analysis and retrieval system for Buddha statues
Proc. ACM Internatinal Conference on Multimedia (MM), Oct. 2019
Using external knowledge in the deep learning framework
Physics Seminar, KEKOct. 2019
Facial expression recognition with skip-connection to leverage low-level features
Proc. IEEE International Conference Image Processing (ICIP), Set. 2019
GANを用いた顔のRGB画像と奥行画像の同時生成
情報処理学会 情報科学技術フォーラム H-018Aug. 2019
Video meets knowledge in visual question answering
画像の認識・理解シンポジウム, 4 pagesAug. 2019
Buddha statues archive retrieval system
画像の認識・理解シンポジウム, 4 pagesAug. 2019
Collecting relation-aware video captions
画像の認識・理解シンポジウム, 4 pagesAug. 2019
Video question answering with BERT
画像の認識・理解シンポジウム, 4 pagesAug. 2019
AI/機械学習/深層学習入門
第16回日本加速器学会年会 技術研修会July 2019
Rethinking the evaluation of video summaries
Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Context-aware embeddings for automatic art analysis
Proc. International Conference on Multimedia Retrieval (ICMR), June 2019
コメディドラマにおける字幕と表情を用いた笑い予測
2019年度人工知能学会全国大会 3Rin2-12, 1 pageJune 2019
Understanding art through multi-modal retrieval in paintings
arXiv preprint arXiv:1904.10615Apr. 2019
Multimodal learning analytics: Society 5.0 project in Japan
Proc. International Conference on Learning Analytics and Knowledge (LAK), Mar. 2019
Problems dealt with machine learning/deep learning and its applications to nuclear physics
Workshop on Interdisciplinary Approach of Applying Cutting-edge Technologies at the Frontier of Cancer ResearchMar. 2019
情報学と物理学のクロスオーバー
日本物理学会 第74回年次大会Mar. 2019
Talking Head Generation with Deep Phoneme and Viseme Representation and Generative Adversarial Networks
電子情報通信学会 パターン認識・メディア理解 PRMU-2018-157Mar. 2019
Faces in an Archive of Buddhism Pictures
情報処理学会 人文科学とコンピュータ研究会 CH-119-7Feb. 2019
多重焦点顕微鏡画像列からの細胞の3次元形状復元
情報処理学会 コンピュータビジョンとイメージメディア CVIM-215-33Jan. 2019
2018
Finding important people in a video using deep neural networks with conditional random fields
IEICE Trans. Information Systems, Oct. 2018
URLOpenCVとPythonによる機械学習プログラミング
Aug. 2018
iParaphrasing: Extracting visually grounded paraphrases via an image
Proc. International Conference on Computational Linguistics (COLING), Aug. 2018
Representing a partially observed non-rigid 3D human using eigen-texture and eigen-deformation
Proc. International Conference on Pattern Recognition (ICPR), Aug. 2018
Iterative applications of image completion with CNN-based failure detection
Journal of Visual Communication and Image Representation, Aug. 2018
URLSummarization of user-generated sports video by using deep action recognition features
IEEE Trans. Multimedia, Aug. 2018
URLSynthesis of human shape in loose cloth using eigen-deformation
画像の認識・理解シンポジウム, 4 pagesAug. 2018
Phrase localization-based visually grounded paraphrase identification
画像の認識・理解シンポジウム, 4 pagesAug. 2018
Exploration and Mining of 50,000 Buddha Pictures
画像の認識・理解シンポジウム, 4 pagesAug. 2018
Linking videos and languages: Representations and their applications
情報処理学会 コンピュータビジョンとイメージメディア CVIM-212-38, 16 pagesMay 2018
Finding Video Parts with Natural Language
情報処理学会 コンピュータビジョンとイメージメディア CVIM-211-7Mar. 2018
Extracting Paraphrases Grounded by an Image
情報処理学会 コンピュータビジョンとイメージメディア CVIM-211-6Mar. 2018
2017
自由視点画像生成のためのEigen-Texture法における係数の回帰
情報処理学会 コンピュータビジョンとイメージメディア CVIM-209-39Nov. 2017
Augmented reality marker hiding with texture deformation
IEEE Trans. Visualization and Computer Graphics, Oct. 2017
URLRealtime novel view synthesis with eigen-texture regression
Proc. British Machine Vision Conference (BMVC), Set. 2017
Video question answering to find a desired video segment
Proc. Open Knowledge Base and Question Answering Workshop (OKBQA), Aug. 2017
Novel view synthesis with light-weight view-dependent texture mapping for a stereoscopic HMD
Proc. IEEE International Conference on Multimedia and Expo (ICME), July 2017
画像処理・機械学習プログラミングOpenCV 3対応
June 2017
Video summarization using textual descriptions for authoring video blogs
Multimedia Tools and Applications, May 2017
URL最近の重要な論文の紹介 -- テキストとの対応付けによる映像の理解に関連して
ステアラボ人工知能シンポジウム2017Mar. 2017
DNNを用いたカメラの6自由度相対運動推定
情報処理学会 コンピュータビジョンとイメージメディア 2017-CVIM-206-13Mar. 2017
ReMagicMirror: Action learning using human reenactment with the mirror metaphor
Proc. International Conference on Multimedia Modeling (MMM), Jan. 2017
Increasing pose comprehension through augmented reality reenactment
Multimedia Tools and Applications, Jan. 2017
URL2016
Acceleration of View-dependent Texture Mapping-based Novel View Synthesis for stereoscopic HMD
映像情報メディア学会2016年冬季大会 2016
Flexible human action recognition in depth video sequences using masked joint trajectories
EURASIP Journal on Image and Video Processing, Dec. 2016
深層学習を利用した映像要約への取り組み
第7回ステアラボ人工知能セミナーNov. 2016
Video summarization using deep semantic features
Proc. Asian Conference on Computer Vision (ACCV), Set. 2016
Learning joint representations of videos and sentences with web image search
Proc. Workshop on Web-scale Vision and Social Media, Aug. 2016
Human action recognition-based video summarization for RGB-D personal sports video
Proc. IEEE International Conference on Multimedia and Expo (ICME), July 2016
Privacy protection for social video via background estimation and CRF-based videographer's intention modeling
IEICE Trans. Information and Systems, Apr. 2016
Joint representation of video and text using deep neural network with help of web images
Microsoft Research Asia, BeijingApr. 2016
Novel View Synthesis Based on View-dependent Texture Mapping with Geometry-aware Color Continuity
Transactions of the Virtual Reality Society of Japan, Mar. 2016
3D shape template generation from RGB-D images capturing a moving and deforming object
Proc. Electronic Imaging, Feb. 2016
畳み込みニューラルネットワークを用いた修復失敗領域の自動検出による画像修復の反復的適用
電子情報通信学会 パターン認識・メディア理解 PRMU-2015-160Feb. 2016
Evaluating protection capability for visual privacy information
IEEE Security \& Privacy, Jan. 2016
2015
画像修復における畳み込みニューラルネットワークを用いた修復失敗領域の自動検出
映像情報メディア学会 2015年冬季大会Dec. 2015
2035年のマルチメディアの姿を予想--ICME 2015 会議レポート
情報処理, Oct. 2015
OpenCV 3 プログラミングブック
Set. 2015
単一のRGB-Dカメラを用いた非剛体物体の3次元形状復元
計測自動制御学会計測部門 センシングフォーラムSet. 2015
Textual description-based video summarization for video blogs
Proc. IEEE International Conference on Multimedia and Expo (ICME), June 2015
Facial expression preserving privacy protection using image melding
Proc. IEEE International Conference on Multimedia and Expo (ICME), June 2015
テクスチャの連続性を考慮した視点依存テクスチャマッピングによる自由視点画像生成
電子情報通信学会 パターン認識・メディア理解 PRMU-2014-162Mar. 2015
特徴点の明示的な対応付けを伴わないカメラ位置姿勢推定
情報処理学会 コンピュータビジョンとイメージメディア CVIM-195-60Mar. 2015
AR image generation using view-dependent geometry modification and texture mapping
Virtual Reality, Jan. 2015
URLProtection and utilization of privacy information via sensing
IEICE Trans. Information and Systems, Jan. 2015
URLテキストと映像の類似度を用いた映像要約
電子情報通信学会 パターン認識・メディア理解 PRMU-2014-95Jan. 2015
RGB-Dカメラを用いた非剛体物体の動き復元のためのRGB画像上の対応点に基づく3次元テンプレート生成
情報処理学会 コンピュータビジョンとイメージメディア CVIM-195-45Jan. 2015
特徴点の明示的な対応付けを伴わないカメラ位置姿勢推定
情報処理学会 コンピュータビジョンとイメージメディア CVIM-195-60Jan. 2015
2014
RGB-Dカメラを用いた非剛体物体の動き復元のための3次元テンプレート形状生成
映像情報メディア学会 2014年冬季大会Dec. 2014
特徴点の類似度尺度による対応付けを伴わないカメラ位置姿勢推定手法の検討
映像情報メディア学会 年次大会Set. 2014
Free-viewpoint AR human-motion reenactment based on a single RGB-D video stream
Proc. IEEE International Conference on Multimedia and Expo (ICME), July 2014
Background estimation for a single omnidirectional image sequence captured with a moving camera
IPSJ Trans. Computer Vision and Applications, July 2014
URL画像のコンテキストを保持した視覚的に自然なプライバシー保護処理
電子情報通信学会 パターン認識・メディア理解 PRMU-2013-205Mar. 2014
自由視点画像生成に基づく移動撮影した全方位動画像からの動物体除去
電子情報通信学会 総合大会 D-11-43, 1 pageMar. 2014
Single RGB-D Video-stream Based Human-motion Reenactment
映像情報メディア学会 メディア工学 ME-2014-7Feb. 2014
2013
Augmented reality image generation with virtualized real objects using view-dependent texture and geometry
Proc. IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Oct. 2013
Inferring what the videographer wanted to capture
Proc. IEEE International Conference on Image Processing (ICIP), Set. 2013
Real-time privacy protection system for social videos using intentionally-captured persons detection
Proc. IEEE International Conference on Multimedia and Expo (ICME), July 2013
拡張現実感のための視点依存テクスチャ・ジオメトリに基づく仮想化実物体の輪郭形状の修復
情報処理学会 コンピュータビジョンとイメージメディア CVIM-185-35Jan. 2013
2012
Markov random field-based real-time detection of intentionally-captured persons
Proc. IEEE International Conference on Image Processing (ICIP), Set. 2012
顔画像に対するプライバシー保護処理の有効性の定量的評価
情報処理学会 セキュリティ心理学とトラスト SPT-4-9July 2012
Intended human object detection for automatically protecting privacy in mobile video surveillance
Multimedia Systems, Mar. 2012
URL2011
Extracting intentionally captured regions using point trajectories
Proc. ACM International Conference on Multimedia (MM), Nov. 2011
Indoor positioning system using digital audio watermarking
IEICE Trans. Information and Systems, Nov. 2011
URLAutomatic generation of privacy-protected videos using background estimation
Proc. IEEE International Conference on Multimedia and Expo (ICME), July 2011
カメラの動きと映像特徴からの撮影者が意図した領域の推定
画像の認識・理解シンポジウムJuly 2011
2010
Automatically protecting privacy in consumer generated videos using intended human object detector
Proc. ACM International Conference on Multimedia (MM), Oct. 2010
Real-time user position estimation in indoor environments using digital watermarking for audio signals
Proc. International Conference on Pattern Recognition (ICPR), Aug. 2010
Discriminating intended human objects in consumer videos
Proc. International Conference on Pattern Recognition (ICPR), Aug. 2010
Detecting intended human objects in human-captured videos
Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2010
Digital diorama: Sensing-based real-world visualization
Proc. International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, June 2010
音響電子透かしを用いた屋内での録音位置推定
電子情報通信学会 2010年総合大会 DS-3-1Mar. 2010
映像中の撮影者が意図した人物被写体の検出
電子情報通信学会 2010年総合大会 D-12-41Mar. 2010
2009
映像特徴に基づく撮影者が意図した人物被写体の推定
情報処理学会 情報科学技術フォーラム K-046Aug. 2009
Watermarked movie soundtrack finds the position of the camcorder in a theater
IEEE Trans. Multimedia, Mar. 2009
URL音響電子透かしの検出強度を用いた位置推定
電子情報通信学会 2009年総合大会 DS-3-10Mar. 2009
2007
Maximum-likelihood estimation of recording position based on audio watermarking
Proc. International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIHMSP), Nov. 2007
Determining Recording Location Based on Synchronization Positions of Audio watermarking
Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2007
2006
Estimation of recording location using audio watermarking
Proc. Workshop on Multimedia and Security (MM\&Sec), Set. 2006