Amazon Kendra is a extremely correct and simple-to-use clever search service powered by machine studying (ML). Amazon Kendra presents a set of knowledge supply connectors to simplify the method of ingesting and indexing your content material, wherever it resides.
Beneficial knowledge in organizations is saved in each structured and unstructured repositories. An enterprise search answer ought to have the ability to index and search throughout a number of structured and unstructured repositories.
Alfresco Content material Companies gives open, versatile, extremely scalable enterprise content material administration (ECM) capabilities with the added advantages of a content material companies platform, making content material accessible wherever and nevertheless you’re employed by simple integrations with the enterprise purposes you utilize every single day. Many organizations use the Alfresco content material administration platform to retailer their content material. One of many key necessities for enterprise clients utilizing Alfresco is the power to simply and securely discover correct info throughout all of the saved paperwork.
We’re excited to announce you can now use the brand new Amazon Kendra Alfresco connector to look paperwork saved in your Alfresco repositories and websites. On this submit, we present how one can use the brand new connector to retrieve paperwork saved in Alfresco for indexing functions and securely use the Amazon Kendra clever search operate. As well as, the ML-powered clever search can precisely discover info from unstructured paperwork with pure language narrative content material, for which key phrase search isn’t very efficient.
What’s new within the Amazon Kendra Alfresco connector
The Amazon Kendra Alfresco connector presents assist for the next:
- Primary and OAuth2 authentication mechanisms for the Alfresco On-Premises (On-Prem) platform
- Primary and OAuth2 authentication mechanisms for the Alfresco PaaS platform
- Facet-based crawling of Alfresco repository paperwork
Resolution overview
With Amazon Kendra, you may configure a number of knowledge sources to offer a central place to look throughout your doc repositories and websites. The answer on this submit demonstrates the next:
- Retrieval of paperwork and feedback from Alfresco personal websites and public websites
- Retrieval of paperwork and feedback from Alfresco repositories utilizing Amazon Kendra-specific elements
- Authentication in opposition to Alfresco On-Prem and PaaS platforms utilizing Primary and OAuth2 mechanisms, respectively
- The Amazon Kendra search functionality with entry management throughout websites and repositories
If you’re going to use solely one of many platforms, you may nonetheless comply with this submit to construct the instance answer; simply ignore the steps comparable to the platform that you’re not utilizing.
The next is a abstract of the steps to construct the instance answer:
- Add paperwork to the three Alfresco websites and the repository folder. Make sure that the uploaded paperwork are distinctive throughout websites and repository folders.
- For the 2 personal websites and repository, use document-level Alfresco permission administration to set entry permissions. For the general public web site, you don’t have to arrange permissions on the doc degree. Be aware that permissions info is retrieved by the Amazon Kendra Alfresco connector and used for entry management by the Amazon Kendra search operate.
- For the 2 personal websites and repository, create a brand new Amazon Kendra index (you utilize the identical index for each the personal websites and the repository). For the general public web site, create a brand new Amazon Kendra index.
- For the On-Prem personal web site, create an Amazon Kendra Alfresco knowledge supply utilizing Primary authentication, throughout the Amazon Kendra index for personal websites.
- For the On-Prem repository paperwork with Amazon Kendra-specific elements, create an information supply utilizing Primary authentication, throughout the Amazon Kendra index for personal websites.
- For the PaaS personal web site, create an information supply utilizing Primary authentication, throughout the Amazon Kendra index for personal websites.
- For the PaaS public web site, create an information supply utilizing OAuth2 authentication, throughout the Amazon Kendra index for public websites.
- Carry out a sync for every knowledge supply.
- Run a check question within the Amazon Kendra index meant for personal websites and the repository utilizing entry management.
- Run a check question within the Amazon Kendra index meant for public websites with out entry management.
Stipulations
You want an AWS account with privileges to create AWS Identity and Access Management (IAM) roles and insurance policies. For extra info, see Overview of access management: Permissions and policies. It’s good to have a primary data of AWS and how one can navigate the AWS Management Console.
For the Alfresco On-Prem platform, full the next steps:
- Create a personal web site or use an present web site.
- Create a repository folder or use an present repository folder.
- Get the repository URL.
- Get Primary authentication credentials (person ID and password).
- Make sure that authentication are a part of the
ALFRESCO_ADMINISTRATORS
group. - Get the general public X509 certificates in .pem format and reserve it domestically.
For the Alfresco PaaS platform, full the next steps:
- Create a personal web site or use an present web site.
- Create a public web site or use an present web site.
- Get the repository URL.
- Get Primary authentication credentials (person ID and password).
- Get OAuth2 credentials (shopper ID, shopper secret, and token URL).
- Affirm that authentication customers are a part of the
ALFRESCO_ADMINISTRATORS
group.
Step 1: Add instance paperwork
Every uploaded doc will need to have 5 MB or much less in textual content. For extra info, see Amazon Kendra Service Quotas. You possibly can add instance paperwork or use present paperwork inside every web site.
As proven within the following screenshot, we now have uploaded 4 paperwork to the Alfresco On-Prem personal web site.
We’ve uploaded three paperwork to the Alfresco PaaS personal web site.
We’ve uploaded 5 paperwork to the Alfresco PaaS public web site.
We’ve uploaded two paperwork to the Alfresco On-Prem repository.
Assign the facet awskendra:indexControl
to a number of paperwork within the repository folder.
Step 2: Configure Alfresco permissions
Use the Alfresco Permissions Administration characteristic to present entry rights to instance customers for viewing uploaded paperwork. It’s assumed that you’ve got some instance Alfresco person names, with electronic mail addresses, that can be utilized for setting permissions on the doc degree in personal websites. These customers aren’t used for crawling the websites.
Within the following instance for the On-Prem personal web site, we now have supplied customers My Dev User1 and My Dev User2 with site-consumer entry to the instance doc. Repeat the identical process for the opposite uploaded paperwork.
Within the following instance for the PaaS personal web site, we now have supplied person Kendra Person 3 with site-consumer entry to the instance doc. Repeat the identical process for the opposite uploaded paperwork.
For the Alfresco repository paperwork, we now have supplied person My Dev user1 with shopper entry to the instance doc.
The next desk lists the positioning or repository names, doc names, and permissions.
Platform | Website or Repository Title | Doc Title | Person IDs |
On-Prem | MyAlfrescoSite | ChannelMarketingBudget.xlsx | My Supervisor User3 |
On-Prem | MyAlfrescoSite | wellarchitected-sustainability-pillar.pdf | My Dev User1, My Dev User2 |
On-Prem | MyAlfrescoSite | WorkDocs.docx | My Dev User1, My Dev User2, My Supervisor User3 |
On-Prem | MyAlfrescoSite | WorldPopulation.csv | My Dev User1, My Dev User2, My Supervisor User3 |
PaaS | MyAlfrescoCloudSite2 | DDoS_White_Paper.pdf | Kendra User3 |
PaaS | MyAlfrescoCloudSite2 | wellarchitected-framework.pdf | Kendra User3 |
PaaS | MyAlfrescoCloudSite2 | ML_Training.pptx | Kendra User1 |
PaaS | MyAlfrescoCloudPublicSite | batch_user.pdf | Everybody |
PaaS | MyAlfrescoCloudPublicSite | Amazon Easy Storage Service – Person Information.pdf | Everybody |
PaaS | MyAlfrescoCloudPublicSite | AWS Batch – Person Information.pdf | Everybody |
PaaS | MyAlfrescoCloudPublicSite | Amazon Detective.docx | Everybody |
PaaS | MyAlfrescoCloudPublicSite | Pricing.xlsx | Everybody |
On-Prem | Repo: MyAlfrescoRepoFolder1 | Polly-dg.pdf (facet awskendra:indexControl) | My Dev User1 |
On-Prem | Repo: MyAlfrescoRepoFolder1 | Transcribe-api.pdf (facet awskendra:indexControl) | My Dev User1 |
Step 3: Arrange Amazon Kendra indexes
You possibly can create a brand new Amazon Kendra index or use an present index for indexing paperwork hosted in Alfresco personal websites. To create a brand new index, full the next steps:
- On the Amazon Kendra console, create an index known as
Alfresco-Personal
. - Create a brand new IAM position, then select Subsequent.
- For Entry Management, select Sure.
- For Token Kind¸ select JSON.
- Preserve the person title and group as default.
- Select None for person group enlargement as a result of we’re assuming no integration with AWS IAM Identity Center (successor to AWS Single Signal-On).
- Select Subsequent.
- Select Developer Version for this instance answer.
- Select Create to create a brand new index.
The next screenshot exhibits the Alfresco-Personal
index after it has been created.
- You possibly can confirm the entry management configuration on the Person entry management tab.
- Repeat these steps to create a second index known as
Alfresco-Public
.
Step 4: Create an information supply for the On-Prem personal web site
To create an information supply for the On-Prem personal web site, full the next steps:
- On the Amazon Kendra console, navigate to the
Alfresco-Personal
index. - Select Information sources within the navigation pane.
- Select Add knowledge supply.
- Select Add connector for the Alfresco connector.
- For Information supply title, enter
Alfresco-OnPrem-Personal
. - Optionally, add an outline.
- Preserve the remaining settings as default and select Subsequent.
To hook up with the Alfresco On-Prem web site, the connector wants entry to the general public certificates comparable to the On-Prem server. This was one of many stipulations.
- Use a unique browser tab to add the .pem file to an Amazon Simple Storage Service (Amazon S3) bucket in your account.
You utilize this S3 bucket title within the subsequent steps.
- Return to the info supply creation web page.
- For Supply, choose Alfresco server.
- For Alfresco repository URL, enter the repository URL (created as a prerequisite).
- For Alfresco person software URL, enter the identical worth because the repository URL.
- For SSL certificates location, select Browse S3 and select the S3 bucket the place you uploaded the .pem file.
- For Authentication, choose Primary authentication.
- For AWS Secrets and techniques Supervisor secret, select Create and add new secret.
A pop-up window opens to create an AWS Secrets Manager secret.
- Enter a reputation to your secret, person title, and password, then select Save.
- For Digital Personal Cloud (VPC), select No VPC.
- Flip the identification crawler on.
- For IAM position, select Create a brand new IAM position.
- Select Subsequent.
You possibly can configure the info supply to synchronize contents from a number of Alfresco websites. For this submit, we sync to the on-prem personal web site.
- For Content material to sync, choose Single Alfresco web site sync and select
MyAlfrescoSite
. - Choose Embody feedback to retrieve feedback along with paperwork.
- For Sync mode, choose Full sync.
- For Frequency, select Run on demand (or a unique frequency choice as wanted).
- Select Subsequent.
- Map the Alfresco doc fields to the Amazon Kendra index fields (you may hold the defaults), then select Subsequent.
- On the Overview and Create web page, confirm all the knowledge, then select Add knowledge supply.
After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.
Step 5: Create an information supply for the On-Prem repository paperwork with Amazon Kendra-specific elements
Equally to the earlier steps, create an information supply for the On-Prem repository paperwork with Amazon Kendra-specific elements:
- On the Amazon Kendra console, navigate to the
Alfresco-Personal
index. - Select Information sources within the navigation pane.
- Select Add knowledge supply.
- Select Add connector for the Alfresco connector.
- For Information supply title, enter
Alfresco-OnPrem-Facets
. - Optionally, add an outline.
- Preserve the remaining settings as default and select Subsequent.
- For Supply, choose Alfresco server.
- For Alfresco repository URL, enter the repository URL (created as a prerequisite).
- For Alfresco person software URL, enter the identical worth because the repository URL.
- For SSL certificates location, select Browse S3 and select the S3 bucket the place you uploaded the .pem file.
- For Authentication, choose Primary authentication.
- For AWS Secrets and techniques Supervisor secret, select the key you created earlier.
- For Digital Personal Cloud (VPC), select No VPC.
- Flip the identification crawler off.
- For IAM position, select Create a brand new IAM position.
- Select Subsequent.
For this scope, the connector retrieves solely these On-Prem server repository paperwork which have been assigned a side known as awskendra:indexControl
.
- For Content material to sync, choose Alfresco elements sync.
- For Sync mode, select Full sync.
- For Frequency, select Run on demand (or a unique frequency choice as wanted).
- Select Subsequent.
- Map the Alfresco doc fields to the Amazon Kendra index fields (you may hold the defaults), then select Subsequent.
- On the Overview and Create web page, confirm all the knowledge, then select Add knowledge supply.
After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.
Step 6: Create an information supply for the PaaS personal web site
Comply with comparable steps because the earlier sections to create an information supply for the PaaS personal web site:
- On the Amazon Kendra console, navigate to the
Alfresco-Personal
index. - Select Information sources within the navigation pane.
- Select Add knowledge supply.
- Select Add connector for the Alfresco connector.
- For Information supply title, enter
Alfresco-Cloud-Personal
. - Optionally, add an outline.
- Preserve the remaining settings as default and select Subsequent.
- For Supply, choose Alfresco cloud.
- For Alfresco repository URL, enter the repository URL (created as a prerequisite).
- For Alfresco person software URL, enter the identical worth because the repository URL.
- For Authentication, choose Primary authentication.
- For AWS Secrets and techniques Supervisor secret, select Create and add new secret.
- Enter a reputation to your secret, person title, and password, then select Save.
- For Digital Personal Cloud (VPC), select No VPC.
- Flip the identification crawler off.
- For IAM position, select Create a brand new IAM position.
- Select Subsequent.
We will configure the info supply to synchronize contents from a number of Alfresco websites. For this submit, we configure the info supply to sync from the PaaS personal web site MyAlfrescoCloudSite2
.
- For Content material to sync, choose Single Alfresco web site sync and select
MyAlfrescoCloudSite2
. - Choose Embody feedback.
- For Sync mode, select Full sync.
- For Frequency, select Run on demand (or a unique frequency choice as wanted).
- Select Subsequent.
- Map the Alfresco doc fields to the Amazon Kendra index fields (you may hold the defaults) and select Subsequent.
- On the Overview and Create web page, confirm all the knowledge, then select Add knowledge supply.
After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.
Step 7: Create an information supply for the PaaS public web site
We comply with comparable steps as earlier than to create an information supply for the PaaS public web site:
- On the Amazon Kendra console, navigate to the Alfresco-Public index.
- Select Information sources within the navigation pane.
- Select Add knowledge supply.
- Select Add connector for the Alfresco connector.
- For Information supply title, enter
Alfresco-Cloud-Public
. - Optionally, add an outline.
- Preserve the remaining settings as default and select Subsequent.
- For Supply, choose Alfresco cloud.
- For Alfresco repository URL, enter the repository URL (created as a prerequisite).
- For Alfresco person software URL, enter the identical worth because the repository URL.
- For Authentication, choose OAuth2.0 authentication.
- For AWS Secrets and techniques Supervisor secret, select Create and add new secret.
- Enter a reputation to your secret, shopper ID, shopper secret, and token URL, then select Save.
- For Digital Personal Cloud (VPC), select No VPC.
- Flip the identification crawler off.
- For IAM position, select Create a brand new IAM position.
- Select Subsequent.
We configure this knowledge supply to sync to the PaaS public web site MyAlfrescoCloudPublicSite
.
- For Content material to sync, choose Single Alfresco web site sync and select
MyAlfrescoCloudPublicSite
. - Optionally, choose Embody feedback.
- For Sync mode, select Full sync.
- For Frequency, select Run on demand (or a unique frequency choice as wanted).
- Select Subsequent.
- Map the Alfresco doc fields to the Amazon Kendra index fields (you may hold the defaults) and select Subsequent.
- On the Overview and Create web page, confirm all the knowledge, then select Add knowledge supply.
After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.
Step 8: Carry out a sync for every knowledge supply
Navigate to every of the info sources and select Sync now. Full just one synchronization at a time.
Anticipate synchronization to be full for all knowledge sources. When every synchronization is full for an information supply, you see the standing as proven within the following screenshot.
You may also view Amazon CloudWatch logs for a particular sync below Sync run historical past.
Step 9: Run a check question within the personal index utilizing entry management
Now it’s time to check the answer. We first run a question within the personal index utilizing entry management:
- On the Amazon Kendra console, navigate to the
Alfresco-Personal
index and select Search listed content material.
- Enter a question within the search discipline.
As proven within the following screenshot, Amazon Kendra didn’t return any outcomes.
- Select Apply token.
- Enter the e-mail tackle comparable to the My Dev User1 person and select Apply.
Be aware that Amazon Kendra entry management works primarily based on the e-mail tackle related to an Alfresco person title.
- Run the search once more.
The search leads to a doc listing (containing wellarchitected-sustainability-pillar.pdf
within the following instance) primarily based on the entry management setup.
For those who run the identical question once more and supply an electronic mail tackle that doesn’t have entry to both of those paperwork, you shouldn’t see these paperwork within the outcomes listing.
- Enter one other question to look within the paperwork primarily based on the facet
awskendra:indexControl
. - Select Apply token, enter the e-mail tackle comparable to My Dev User1 person, and select Apply.
- Rerun the question.
Step 10: Run a check question within the public index with out entry management.
Equally, we will check our answer by operating queries within the public index with out entry management:
- On the Amazon Kendra console, navigate to the Alfresco-Public index and select Search listed content material.
- Run a search question.
As a result of this instance Alfresco public web site has not been arrange with any entry management, we don’t use an entry token.
Clear up
To keep away from incurring future prices, clear up the sources you created as a part of this answer. Delete newly added Alfresco knowledge sources throughout the indexes. For those who created new Amazon Kendra indexes whereas testing this answer, delete them as properly.
Conclusion
With the brand new Alfresco connector for Amazon Kendra, organizations can faucet into the repository of data saved of their account securely utilizing clever search powered by Amazon Kendra.
To study these prospects and extra, check with the Amazon Kendra Developer Guide. For extra info on how one can create, modify, or delete metadata and content material when ingesting your knowledge from Alfresco, check with Enriching your documents during ingestion and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.
Concerning the Authors
Arun Anand is a Senior Options Architect at Amazon Net Companies primarily based in Houston space. He has 25+ years of expertise in designing and creating enterprise purposes. He works with companions in Vitality & Utilities section offering architectural and finest apply suggestions for brand spanking new and present options.
Rajnish Shaw is a Senior Options Architect at Amazon Net Companies, with a background as a Product Developer and Architect. Rajnish is obsessed with serving to clients construct purposes on the cloud. Outdoors of labor Rajnish enjoys spending time with household and mates, and touring.
Yuanhua Wang is a software program engineer at AWS with greater than 15 years of expertise within the expertise trade. His pursuits are software program structure and construct instruments on cloud computing.