Skip to main content

AI-ReligiousTexts: AI knowledge resources for understanding religious texts

PGR-P-1042

Coronavirus information for applicants and offer holders

We hope that by the time you’re ready to start your studies with us the situation with COVID-19 will have eased. However, please be aware, we will continue to review our courses and other elements of the student experience in response to COVID-19 and we may need to adapt our provision to ensure students remain safe. For the most up-to-date information on COVID-19, regularly visit our website, which we will continue to update as the situation changes www.leeds.ac.uk/covid19faqs

Key facts

Type of research degree
PhD
Application deadline
Sunday 3 July 2022
Project start date
Saturday 1 October 2022
Country eligibility
International (open to all nationalities, including the UK)
Funding
Non-funded
Supervisors
Professor Eric Atwell
Schools
School of Computing
Research groups/institutes
Artificial Intelligence
<h2 class="heading hide-accessible">Summary</h2>

The Quran, Hadith and Bible are the core religious knowledge sources for 4+ billion Muslims and Christians, providing guidance and answers on how to live their lives faithfully. We will collect a 10M-word QuHaBiQA corpus knowledge base of Quran Hadith and Bible source texts, commentaries or exegesis texts, and questions and answers from attested sources, including exegesis texts (eg Tafsir), Islamic and Christian QA sources, and elicitation from experts. We will build on our prior work on Quran Hadith and Bible corpus linguistics and question-answering. Each Q+A instance will include question in English and Arabic, answer as list of verses from Quran/Hadith/Bible in English and Arabic, and commentary linking the verse to the question, also in English and Arabic. We will investigate fast instance-based learning, using the knowledge base as a look-up table. User questions can be directly matched to questions with expert-attested answers. Our fast matching QA system will be compared against other state-of-the-art QA systems: the QuHaBiQA corpus will provide training and evaluation data for a Shared Task workshop in religious text QA, and participants will develop a variety of QA solutions. The fast lookup-based Religious QA data-set and software will be open contributions to NLP and Religious Studies research, and to the 4+ billion Muslim and Christian user communities. Much research in text understanding and question answering has aimed at general architectures transferrable between different domains [YanW19]. We want to focus on this specialized domain with 4+ billion users, which has special properties which favour computationally-efficient methods. General-purpose text-understanding and question-answering systems can have large computational requirements [YanZ19]. Religious texts like the Quran, Hadith and Bible have special properties which make them amenable to more computationally efficient Q-A solutions: (a) The texts have been widely studied by believers and experts, resulting in many sources of expert answers to questions about the texts, to meet demand for guidance from believers [Ham16]. (b) Answers generally have a constrained structure, consisting of reference(s) to specific verse(s), sometimes with commentary explaining the link between the question and the verse(s) [Alq19]. (c) The commentary is given because there may be little direct lexical overlap between the question and the verse; the link is based on religious insight and understanding [Sha12]. (d) there is large demand from Muslims and Christians for religious guidance: they want answers from the Quran, Hadith or Bible to their questions on religion, and more generally on life [Tor19].

<h2 class="heading hide-accessible">Full description</h2>

<p>To exploit (a), we will collect a 10M-word QuHaBiQA corpus of Quran Hadith and Bible questions and answers from attested sources, including exegesis texts (eg Tafsir), Islamic and Christian QA sources, and elicitation from experts. We will build on our prior work on Quran Hadith and Bible corpus linguistics and question-answering e.g. [Abu07], [Abu09], [Ham16], [Has16], [Abu16], [Ahm17], [Eme18], [Atw19], [Alq19], [Liu19]. We will call upon our network of contacts in religious text analysis in the Middle East, Europe, Asia, Africa and the Americas to identify a broad range of Islamic and Christian QA sources.&nbsp;</p> <p>To exploit (b) and (c), each instance will include question in English and Arabic, answer as list of verses from Quran/Hadith/Bible in English and Arabic, and commentary linking the verse to the question, also in English and Arabic. The Quran and Hadith are widely read in Arabic, and/or translations in English; English is in effect the second language of Islam. The Bible is most widely read in English, and we include Arabic answers to Bible questions for comparisons, and to demonstrate portability of our methods across languages. This will be the first large English+Arabic QA data-set bringing together in one knowledge base the core texts of Islam and Christianity; it will enable questions on comparisons of Quran, Hadith and Bible teachings.</p> <p>This corpus will be a training set for a machine-learning QA system, to meet demand (d). As a computationally-light alternative to computationally-heavy deep learning, we will investigate fast instance-based learning, using the corpus as a look-up table [Abu09]. User questions can be directly matched to known questions, and this directly delivers an expert-attested answer. If there is no exact/close match to a known question, we will explore backup methods to match input against corpus questions or else answers. Even in this case there may be no need for neural models: simpler pattern-matching will generally suffice to match an appropriate question/answer in the corpus, because of the subject-specific and constrained nature of the task. We will also experiment with neural-based QA, to test whether this delivers better performance; we expect to show the added computation demand does not improve F1 score or user satisfaction metrics [Abu16].&nbsp;</p> <p>The fast lookup-table based Religious QA data-set and software will be open contributions to NLP and Religious Studies research, and to the Muslim and Christian user communities.</p> <p>The QuHaBiQA corpus can provide training and evaluation data for a research community Shared Task in religious text QA: participants will be asked to develop a variety of QA solutions. We will evaluate not only accuracy/F1 but also computational efficiency: the winner will need to score highly in both criteria, and our entry, the result of this project, will be compared against rival approaches from other research teams.</p> <p>The AI research group in the School of Computing at Leeds University has developed a range of online resources for analysis and understanding of Arabic and Islamic texts.&nbsp; For example, in our project &ldquo;Natural Language Processing working together with Arabic and Islamic Studies&rdquo; (EP/K015206/1) we investigated text analytics methods to identify key concepts and relations, topics and threads in the text of the Quran [Atwe18], [Atw19], [Liu19]; and we presented results at the ASAR&rsquo;2018 conference at the Alan Turing Institute for Artificial Intelligence and Data Analytics [Alr18], [Alo18],[Alq18]. We have also developed resources for other aspects of Islamicate Digital Humainities, such as modern Arabic language resources [Alg19], [Als19] and Hadith resources [Has16].</p> <p>The Islamicate Digital Humanities Network <a href="https://idhn.org/">https://idhn.org/</a> is &ldquo;a network of scholars with a research focus on Islamicate Digital Studies. This includes scholars from the Humanities, Computer Sciences, Computational Linguistics as well as librarians and archivists that work on topics that relate to the Middle Eastern politics and culture, Islam as a religion, Arabic, Persian and other Islamicate languages, et al. and that are already using or interested in using digital methods for their research.&rdquo;&nbsp; The IDHN website aims to be a repository for data-sets, software, documentation, and other research resources: &ldquo;We are working on creating a comprehensive overview over text and media databases for the Islamicate Studies, the latest tools that make working in the Digital Humanities so much easier and tutorials on how to use them.&rdquo; This project will contribute to Islamicate Digital Humanities research resources, building them into the IDHN online repository. This IDHN resources repository will be made accessible to IDHN researchers and other users world-wide. Your project will investigate standards, formats and platforms for open access resources, and aim to build IDHN resources that meet these standards and are accessible to a wide range of interfaces and software. For example, the data should be available in different formats required by Prot&eacute;g&eacute;, Weka, SketchEngine, Excel, etc.</p> <p><strong>REFERENCES</strong></p> <p>[Abu07] B AbuShawar, E Atwell. 2007. Chatbots: are they really useful?. Journal for Language Technology and Computational Linguistics, 22(1):29-49.</p> <p>[Abu09] B AbuShawar, E Atwell. 2009. Arabic question-answering via instance based learning from an FAQ corpus. Proc CL&rsquo;2009 International conference on Corpus Linguistics.</p> <p>[Abu16] B AbuShawar, E Atwell. 2016. Usefulness, localizability, humanness, and language-benefit: additional evaluation criteria for natural language dialogue systems. International Journal of Speech Technology, 19(2):373-383.</p> <p>[Ahm17] N Ahmad, B Bennett, E Atwell. 2017. Retrieval performance for Malay Quran. International Journal on Islamic Applications in Computer Science and Technology 5(2):13-25.</p> <p>[Alg18] Alghamdi A, Atwell E. 2019. Constructing a corpus-informed list of Arabic formulaic sequences (ArFSs) for language pedagogy and technology. International Journal of Corpus Linguistics. 24(2), pp. 202-228&nbsp;</p> <p>[Alo18] Alosaimy A; Atwell E. 2018. Diacritization of a Highly Cited Text: A Classical Arabic Book as a Case Proceedings of ASAR&#39;2018 Arabic Script Analysis and Recognition, pp. 72-77. London: Alan Turing Institute</p> <p>[Alq18] Alqahtani M; Atwell E. 2018. Developing Bilingual Arabic-English Ontologies of Al-Quran Proceedings of ASAR&#39;2018 Arabic Script Analysis and Recognition, pp. 96-101. London: Alan Turing Institute.</p> <p>[Alq19] M Alqahtani. 2019. Quranic Arabic semantic search tool based on ontology of concepts. PhD Thesis, School of Computing, University of Leeds.</p> <p>[Alr18] Alrehaili SM; Atwell E. 2018. Discovering Qur&rsquo;anic Knowledge through AQD: Arabic Qur&rsquo;anic Database, a Multiple Resources Annotation-level Search Proceedings of ASAR&#39;2018 Arabic Script Analysis and Recognition, pp. 102-107. London: Alan Turing Institute.</p> <p>[Als19] Alshutayri A, Atwell E. 2019. A Social Media Corpus of Arabic Dialect Text. In: Stemle E; Wigham CR (eds.) Computer-Mediated Communication Building Corpora for sociolinguistic Analysis. Clermont-Ferrand, France: Presses universitaires Blaise Pascal.</p> <p>[Atw18] Atwell E. 2018. Classical and modern Arabic corpora: Genre and language change. In: Whitt RJ (eds.) Diachronic Corpora, Genre, and Language Change. Studies in Corpus Linguistics. John Benjamins, pp. 65-91.</p> <p>[Atw19]. E Atwell. 2019. Using the web to model modern and Quranic Arabic. In: T McEnery, A Hardie, N Younis (eds.) Arabic Corpus Linguistics. Edinburgh University Press.</p> <p>[Eme18]. A Emefoh. 2018. Bible question answering system. MSc Thesis, School of Computing, University of Leeds.</p> <p>[Hac17] C Hackett, D McClendon. 2017. Christians remain world&rsquo;s largest religious group. Fact-tank, Pew Research Centre.</p> <p>[Ham16] B Hamoud, E Atwell. 2016. Compiling a Quran question and answer corpus : ????? ????? ????? ?????? ?????? ???? Proc ICCA&rsquo;16 International Conference on Computing in Arabic.</p> <p>[Has16] S Hassan, E Atwell. 2016. Concept search tool for multilingual Hadith corpus. International Journal of Science and Research 5(4):1326-1328.</p> <p>[Liu19] Z Liu Z, L Yang, E Atwell. 2019. Semantic annotation of the Quran corpus based on Hierarchical Network of Concepts theory. Proc IALP&rsquo;18 International conference on Asian Language Processing.</p> <p>[Sha12] A Sharaf, E Atwell. 2012. QurSim: A corpus for evaluation of relatedness in short texts. Proc LREC&rsquo;12 Language Resources and Evaluation Conference.</p> <p>[Tor19] J Torrecillas et al. 2019. Religious support and emotional functioning in India across three major religions. International Journal for the Psychology of Religion. Preprint.</p> <p>[YanW19] W Yang et al. 2019. End-to-end open-domain question answering with BERTserini. arXiv preprint arXiv:1902.01718</p> <p>[YanZ19] Z Yang et al. 2019. Model compression with multi-task knowledge distillation for web-scale question answering system. arXiv preprint arXiv:1904.09636</p> <p>&nbsp;</p> <p>&nbsp;</p>

<h2 class="heading">How to apply</h2>

<p>Formal applications for research degree study should be made online through the&nbsp;<a href="http://www.leeds.ac.uk/rsa/prospective_students/apply/I_want_to_apply.html">University&#39;s website</a>. Please state clearly in the research information section&nbsp;that the research degree you wish to be considered for is <em><strong>AI-ReligiousTexts: AI knowledge resources for understanding religious texts</strong></em> as well as <a href="https://eps.leeds.ac.uk/computing/staff/33/professor-eric-atwell">Professor Eric Atwell </a>as your proposed supervisor.</p> <p>If English is not your first language, you must provide evidence that you meet the University&#39;s minimum English language requirements (below).</p> <p>&nbsp;</p>

<h2 class="heading heading--sm">Entry requirements</h2>

Applicants to research degree programmes should normally have at least a first class or an upper second class British Bachelors Honours degree (or equivalent) in an appropriate discipline. The criteria for entry for some research degrees may be higher, for example, several faculties, also require a Masters degree. Applicants are advised to check with the relevant School prior to making an application. Applicants who are uncertain about the requirements for a particular research degree are advised to contact the School or Graduate School prior to making an application.

<h2 class="heading heading--sm">English language requirements</h2>

The minimum English language entry requirement for research postgraduate research study is an IELTS of 6.5 overall with at least 6.5 in writing and at 6.0 in reading, listening and speaking) or equivalent. The test must be dated within two years of the start date of the course in order to be valid. Some schools and faculties have a higher requirement.

<h2 class="heading">Funding on offer</h2>

<p><strong>Self-Funded or externally sponsored students are welcome to apply.</strong></p> <p><strong>UK</strong>&nbsp;&ndash;&nbsp;The&nbsp;<a href="https://phd.leeds.ac.uk/funding/209-leeds-doctoral-scholarships-2022">Leeds Doctoral Scholarships</a>, <a href="https://phd.leeds.ac.uk/funding/53-school-of-computing-scholarship">School of Computing Scholarship&nbsp;</a>, <a href="https://phd.leeds.ac.uk/funding/198-akroyd-and-brown-scholarship-2022">Akroyd &amp; Brown</a>, <a href="https://phd.leeds.ac.uk/funding/199-frank-parkinson-scholarship-2022">Frank Parkinson</a> and <a href="https://phd.leeds.ac.uk/funding/204-boothman-reynolds-and-smithells-scholarship-2022">Boothman, Reynolds &amp; Smithells</a> Scholarships are available to UK applicants. &nbsp;<a href="https://phd.leeds.ac.uk/funding/60-alumni-bursary">Alumni Bursary</a> is available to graduates of the University of Leeds.&nbsp;</p> <p><strong>Non-UK</strong>&nbsp;&ndash; The&nbsp;<a href="https://phd.leeds.ac.uk/funding/53-school-of-computing-scholarship">School of Computing Scholarship&nbsp;</a>&nbsp;is available to support the additional academic fees of Non-UK applicants. The&nbsp;<a href="https://phd.leeds.ac.uk/funding/48-china-scholarship-council-university-of-leeds-scholarships-2021">China Scholarship Council - University of Leeds Scholarship</a>&nbsp;is available to nationals of China. The&nbsp;<a href="https://phd.leeds.ac.uk/funding/73-leeds-marshall-scholarship">Leeds Marshall Scholarship</a>&nbsp;is available to support US citizens. <a href="https://phd.leeds.ac.uk/funding/60-alumni-bursary">Alumni Bursary</a> is available to graduates of the University of Leeds.</p>

<h2 class="heading">Contact details</h2>

<p>For further information regarding your application, please contact Doctoral College Admissions<br /> e:&nbsp;<a href="mailto:phd@engineering.leeds.ac.uk">phd@engineering.leeds.ac.uk</a>, t: +44 (0)113 343 5057.</p> <p>For further information regarding the project, please contact Professor Eric Atwell<br /> e:&nbsp;<a href="mailto:E.S.Atwell@leeds.ac.uk">E.S.Atwell@leeds.ac.uk</a></p>


<h3 class="heading heading--sm">Linked research areas</h3>